Introduction to Artificial Intelligence - Homework03Ā¶
- NETID:
- Name:
This assignment covers the following topics:
- Bayesian Statistics and Networks
Complete all sections. Some questions will require written answers, while others will involve coding. Be sure to run your code cells to verify your solutions.
You will likely find it useful to have open the documentation for the Naive Bayes module from scikit-learn while you work. Run the cell below this one to see a visualization of a simple Naive Bayes network for the Iris Dataset
import matplotlib.pyplot as plt
import networkx as nx
from pgmpy.models import NaiveBayes
from sklearn.datasets import load_iris
from sklearn.naive_bayes import GaussianNB
import numpy as np
# Load the Iris dataset
iris = load_iris()
X = iris.data
y = iris.target
# Fit a Gaussian Naive Bayes model
gnb = GaussianNB()
gnb.fit(X, y)
# Extract the mean and standard deviation from the model
means = gnb.theta_ # Mean of each feature per class
std_devs = np.sqrt(gnb.var_) # Standard deviation of each feature per class
# Create a Naive Bayes model using pgmpy
model = NaiveBayes()
model.add_edges_from([('Species', 'SepalLength'),
('Species', 'SepalWidth'),
('Species', 'PetalLength'),
('Species', 'PetalWidth')])
# Create a networkx graph from the edges of the Naive Bayes model
G = nx.DiGraph()
G.add_edges_from(model.edges())
# Define a layout for the nodes
pos = nx.spring_layout(G)
# Visualize the model using networkx
plt.figure(figsize=(12, 8))
nx.draw(G, pos, with_labels=True, node_size=2000, node_color='lightblue', font_size=12, font_weight='bold', arrows=True)
# Annotate nodes with summary statistics
features = ['SepalLength', 'SepalWidth', 'PetalLength', 'PetalWidth']
species = iris.target_names
for i, feature in enumerate(features):
annotation = ""
for j, specie in enumerate(species):
annotation += f"{specie}:\nĪ¼={means[j, i]:.2f}, Ļ={std_devs[j, i]:.2f}\n"
plt.text(pos[feature][0], pos[feature][1]-0.2, annotation, fontsize=8, bbox=dict(facecolor='white', alpha=0.8))
plt.title('Naive Bayes Network with Summary Statistics')
plt.show()
1.) Explain in your own words what Bayes Rule is useful for
2.) Creating and running a simple Naive Bayes classifier
Fill in the code below at the indicated spots!
# %pip install sklearn
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score, classification_report
# Load the Iris dataset
iris = load_iris()
X = iris.data
y = iris.target
# Split into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
# Create and train the Gaussian Naive Bayes model
model = GaussianNB()
model.fit(X_train, y_train)
# Make predictions
# Calculate accuracy
# Generate a classification report
Accuracy: 97.78% Classification Report: precision recall f1-score support setosa 1.00 1.00 1.00 19 versicolor 1.00 0.92 0.96 13 virginica 0.93 1.00 0.96 13 accuracy 0.98 45 macro avg 0.98 0.97 0.97 45 weighted avg 0.98 0.98 0.98 45
3.) Explain in your own words how Naive Bayes makes it's decisions
4.) Manually classifying a sample
Use the Gaussian Probability Density function shown in class to predict the class probabilities for the single sample given below. This will require you to compute the likelihoods for each feature and each class and the using the fact that the features are considered conditionally independent to get the overall likelihood for each class. Use the mu and sigma's from the visualization above.
Sample = {Sepal Length: 5.5, Sepal Width: 3.0, Petal Length: 3.5, Petal Width: 1.6}
Note: Using the predict_proba function for the model that we visualized above I got:
[[1.82646146e-66 9.99474514e-01 5.25486076e-04]]
But there will be some rounding errors I'm sure.
5.) Explain where the mu and sigma come from for each feature and each class
For Fun
Here's a cool link about writing your own email spam filter using Naive Bayes