Welcome to CSE 30124 - Introduction to Artificial Intelligence
This packet is intended to be a simple review of material I will expect familiarity with (due to prereqs!) and a brief introduction to some of the libraries we will be using throughout the course. This packet is not comprehensive, but completing the packet will certainly help you prepare for class.
Introduction: Jupyter Notebooks
Review: Linear Transformations
Introduction: Probability Distributions
Introduction: Data Distributions
Introduction: Function Approximation
Introduction: Python Libraries for AI
Jupyter Notebooks allow you to write and execute Python code in an interactive environment. They consist of cells that can either contain text (Markdown) or code.
Try running the code below by selecting the cell and pressing
Shift + Enter
.
print('Hello, CSE 30124')
Markdown cells are used to write and format text. You can use Markdown to add headings, lists, links, images, and even LaTeX equations to your notebook.
Here's an example of a Markdown list:
**Bold text**
*Italic text*
[Link](https://www.example.com)
You can also add images using Markdown:
![Image description](https://www.example.com/image.png)
Try editing this cell and adding some of your own Markdown elements!
Now that you know the basics of Markdown and Code cells, it's time for you to create your own pieces of a notebook
Good luck, and welcome to CSE 30124!
Matrix multiplication is a fundamental operation in linear algebra that can be interpreted as a linear transformation.
In mathematics, particularly in linear algebra, the concepts of vector spaces and subspaces are foundational for understanding the structure and behavior of vectors.
A vector space is a collection of vectors that can be added together and multiplied by scalars (real numbers), satisfying certain axioms. These axioms include associativity, commutativity of addition, existence of an additive identity (zero vector), and distributive properties of scalar multiplication.
Consider the set of all possible positions of a drone in a 3D space. Each position can be represented as a vector ((x, y, z)), where (x), (y), and (z) are coordinates in space. The set of all such vectors forms a vector space because you can add two position vectors or scale a position vector by a real number to get another valid position vector.
A subspace is a subset of a vector space that is itself a vector space under the same operations of addition and scalar multiplication. For a subset to be a subspace, it must include the zero vector, be closed under vector addition, and be closed under scalar multiplication.
Continuing with the drone example, imagine the drone is restricted to fly only at a constant altitude, say (z = 10). The set of all position vectors ((x, y, 10)) forms a subspace of the original 3D vector space. This subspace is essentially a 2D plane within the 3D space, where the drone can move freely in the (x) and (y) directions but not in the (z) direction.
While it's easy to visualize a drone in 3D space you may be wondering why this matters. In AI, we often aren't operating on data that is in a "real" space. Instead our data is highly abstract.
A classic data set in Machine Learning is the Iris data set. How would you describe a flower to a computer? Representation of data is one of the trickiest and most important things in AI.
One way we could represent a flower in the computer is to give it four different measurements based on attributes of the flower. In the Iris dataset we have four measurements: Petal Length, Petal Width, Sepal Length, and Sepal Width.
In this way we can describe each type of iris to the computer, in a way that lets us compare between the three types of irises in the data.
More importantly however, we could imagine each Iris being a point in 4D space! Each axis of this space corresponds to one of our 4 measurements, and by combining all four measurements used to describe a single flower, we could plot a point in space to represent this flower! This also means that if we could somehow compute the distance between two points in 4D space we could tell how different or similar any two given flowers are.
It's pretty hard to visualize 4D space but it's even harder to visualize 12,288 dimensional space, which is how many dimensions a word has in chatGPT 3. The important take away here is that regardless of what our data set is, we can usually represent it as a collection of points in space which allows us to use linear algebra to learn things about our data.
Run the code cell below to see your first neural network in action! We will dive into this example during multiple lectures this semester, but it is a nice visualization of the use of linear algebra to transform input data to allow us to make decisions!
import numpy as np
import matplotlib.pyplot as plt
from IPython.display import display, clear_output
import torch
import torch.nn as nn
import torch.optim as optim
'text.usetex'] = True
plt.rcParams[
# Generate synthetic 3D input data
42)
np.random.seed(= 100
n_samples = np.random.randn(n_samples, 3) # 3D input
X = (X[:, 0] + X[:, 1] > 0).astype(int) # Simple binary classification (based on a plane)
y
# Convert to PyTorch tensors
= torch.tensor(X, dtype=torch.float32)
X_tensor = torch.tensor(y, dtype=torch.float32).unsqueeze(1)
y_tensor
# Define the neural network
class SimpleNN(nn.Module):
def __init__(self):
super(SimpleNN, self).__init__()
self.hidden = nn.Linear(3, 2) # Reduce 3D to 2D
self.output = nn.Linear(2, 1) # Reduce 2D to 1D
def forward(self, x):
= self.hidden(x)
hidden = self.output(hidden)
out return hidden, out
# Initialize the model, loss, and optimizer
= SimpleNN()
model = nn.BCEWithLogitsLoss() # Binary classification loss
criterion = optim.Adam(model.parameters(), lr=0.01)
optimizer
# Metrics to track
= []
losses = []
accuracies
# Training loop
= 100
n_epochs for epoch in range(1, n_epochs + 1):
model.train()
optimizer.zero_grad()
# Forward pass
= model(X_tensor)
hidden_out, output = criterion(output, y_tensor)
loss
# Backward pass and optimization
loss.backward()
optimizer.step()
# Compute accuracy
with torch.no_grad():
= torch.sigmoid(output) > 0.5 # Binary predictions
predictions = (predictions == y_tensor).float().mean().item()
accuracy
# Track metrics
losses.append(loss.item())
accuracies.append(accuracy)
# Convert outputs to NumPy for visualization
= hidden_out.detach().numpy()
hidden_out_np = torch.sigmoid(output).detach().numpy() # Apply sigmoid for probabilities
output_np
# Select the first sample to show real inputs and outputs
= 0
sample_idx = X[sample_idx] # Real input
x_sample = hidden_out_np[sample_idx] # Hidden layer output
hidden_sample = output_np[sample_idx][0] # Final output
output_sample
# Visualization setup
= plt.subplots(2, 3, figsize=(18, 10))
fig, axs
# Create 3D subplot for the input space
= fig.add_subplot(231, projection='3d')
ax_3d = axs[0, 1]
ax_2d = axs[0, 2]
ax_1d
= ['red' if label == 0 else 'blue' for label in y]
colors
# 3D space
0], X[:, 1], X[:, 2], c=colors, alpha=0.7)
ax_3d.scatter(X[:, 0], x_sample[1], x_sample[2], c='yellow', s=100, edgecolor='black', label='Highlighted Sample')
ax_3d.scatter(x_sample[f"3D Input Space (Epoch {epoch})")
ax_3d.set_title("X1")
ax_3d.set_xlabel("X2")
ax_3d.set_ylabel("X3")
ax_3d.set_zlabel(
ax_3d.legend()
# 2D hidden space
0], hidden_out_np[:, 1], c=colors, alpha=0.7)
ax_2d.scatter(hidden_out_np[:, 0], hidden_sample[1], c='yellow', s=100, edgecolor='black', label='Highlighted Sample')
ax_2d.scatter(hidden_sample["2D Hidden Layer with Decision Boundary")
ax_2d.set_title("H1")
ax_2d.set_xlabel("H2")
ax_2d.set_ylabel(
ax_2d.legend()
# 1D output space
0], np.zeros_like(output_np[:, 0]), c=colors, alpha=0.7)
ax_1d.scatter(output_np[:, 0, c='yellow', s=100, edgecolor='black', label='Highlighted Sample')
ax_1d.scatter(output_sample, 0.5, color='green', linestyle='--', label='Decision Boundary (0.5)')
ax_1d.axvline("1D Output Space with Decision Boundary")
ax_1d.set_title("Output")
ax_1d.set_xlabel(
ax_1d.legend()
# Learning curve plot
1, 0].plot(range(1, epoch + 1), losses, label="Loss", color='blue')
axs[1, 0].set_title("Learning Curve")
axs[1, 0].set_xlabel("Epoch")
axs[1, 0].set_ylabel("Loss")
axs[1, 0].legend()
axs[
# Accuracy plot
1, 1].plot(range(1, epoch + 1), accuracies, label="Accuracy", color='green')
axs[1, 1].set_title("Accuracy Curve")
axs[1, 1].set_xlabel("Epoch")
axs[1, 1].set_ylabel("Accuracy")
axs[1, 1].legend()
axs[
# Display weights, biases, and computation in LaTeX-style
1, 2].axis('off') # Turn off axis
axs[
# Highlighted sample (we'll use the first sample for simplicity)
= 0
sample_idx = X[sample_idx] # Real input for the highlighted sample
x_sample = hidden_out_np[sample_idx] # Hidden layer output for the sample
hidden_sample = output_np[sample_idx][0] # Final output for the sample
output_sample
# Generate the weight matrices and biases as NumPy arrays for computation
= model.hidden.weight.detach().numpy()
hidden_weights = model.hidden.bias.detach().numpy()
hidden_biases = model.output.weight.detach().numpy()
output_weights = model.output.bias.detach().numpy()
output_bias
# Perform manual computations for display
= hidden_weights @ x_sample + hidden_biases
hidden_layer_result = output_weights @ hidden_sample + output_bias
output_layer_result
# Generate plain-text equations with better alignment and spacing
= (
equation_text f"Hidden Layer:\n"
f" [{hidden_weights[0,0]:.2f}, {hidden_weights[0,1]:.2f}, {hidden_weights[0,2]:.2f}] * [{x_sample[0]:.2f}]\n"
f" [{hidden_weights[1,0]:.2f}, {hidden_weights[1,1]:.2f}, {hidden_weights[1,2]:.2f}] [{x_sample[1]:.2f}]\n"
f" [{x_sample[2]:.2f}]\n"
f"+ [{hidden_biases[0]:.2f}]\n"
f" [{hidden_biases[1]:.2f}]\n"
f"= [{hidden_layer_result[0]:.2f}]\n"
f" [{hidden_layer_result[1]:.2f}]\n\n"
f"Output Layer:\n"
f" [{output_weights[0,0]:.2f}, {output_weights[0,1]:.2f}] * [{hidden_sample[0]:.2f}]\n"
f" [{hidden_sample[1]:.2f}]\n"
f"+ [{output_bias[0]:.2f}]\n"
f"= [{output_layer_result[0]:.2f}]\n\n"
f"Sigmoid Output: Sigma({output_layer_result[0]:.2f}) = {output_sample:.2f}"
)1, 2].axis('off') # Turn off the subplot axes
axs[1, 2].text(0.1, 0.5, equation_text, fontsize=10, verticalalignment='center', transform=axs[1, 2].transAxes)
axs[
# Update the visualization
=True)
clear_output(wait
display(fig)
plt.close(fig)
# Print metrics
print(f"Epoch {epoch}/{n_epochs}, Loss: {loss.item():.4f}, Accuracy: {accuracy:.4f}")
# Final static display
plt.show()
Don't worry if you don't understand in either the code or the visualization, that's the point of this class! By the end of the semester this will be as easy as programming a fractal.
Probability distributions describe how probabilities are distributed over events.
Simulate rolling a die 1000 times and plot the frequency of each outcome.
import numpy as np
import matplotlib.pyplot as plt
# Simulate rolling a die
= np.random.randint(1, 7, size=1000)
rolls
# Plot the frequency
=np.arange(1, 8)-0.5, edgecolor='black')
plt.hist(rolls, bins"Die Roll Frequencies")
plt.title("Die Value")
plt.xlabel("Frequency")
plt.ylabel(range(1, 7))
plt.xticks( plt.show()
This is an example of a uniform distribution, where every event has an equally likely outcome. However a more common distribution is the normal, or gaussian, distribution. Consider the iris dataset above and try and imagine gathering 1,000,000 iris setosas. If you measured all of their petals and then plotted them what you'd likely discover is a normal distribution of petals. There is some sort of "platonic" petal length for a setosa and the flowers don't usually deviate that far from that ideal, average length of petal.
Imagine you were given a new flower and you wanted to figure out which of the three types of iris it was. What's one thing you could try?
The idea of data distributions is a very powerful one, but also very important to understand to help mitigate bias in artificial intelligence.
Much like the idea of a probability distribution, a data distribution is a sort of description of the data you have. A question I asked on the final exam last semester was
A Norwegian company has a face identification tool that they trained on participants in yoga classes near their offices. They are about to roll out their model globally, do you think this model will work well?
It's important here to consider the distribution of the training data used for the AI model. Was the training data distribution representative of the global distribution? In this question of course not.
Police Facial Recognition Technology Can’t Tell Black People Apart
This is a colossal issue AI faces, a model is only as good as the data you feed it. Data distrubtions and the representativness of the data you have access to are extremely important to keep in mind as you design models. You can never have all the training data to ever exist. You could never possibly measure every single iris to ever exist but hopefully you can collect a data set that is representative enough to model the underlying "functions" that generate irises.
This leads into the idea of function approximation. Mostly what we do with machine learning is try and create a model by learning from our training data that approximates the hidden function that generated the training data.
Take for example the relationship between caloric intake and blood sugar. This is an extremely complex relationship but there is some hidden function, known exactly only to God, that maps from caloric intake to blood sugar exactly. However we're only human so all we can do is collect a bunch of samples of caloric intake and blood sugar measurements. Imagine we then plotted these samples:
What we can do is train a model to fit our dataset. If we only had the data points and were given a new caloric intake value we couldn't predict the expected blood sugar. By fitting a model to our data though we can try and approximate the hidden function and if our approximation is good we can mostly predict the expected blood sugar given a new caloric intake.
Much of modern AI can be thought of as just a function approximation task.
Neural Networks are Function Approximation Algorithms
This article goes more in depth if you're interested
Python has become the de facto language for AI and machine learning, largely due to its powerful ecosystem of specialized libraries. Let's explore the three most fundamental libraries you'll use in your AI journey.
NumPy (Numerical Python) is the fundamental package for scientific computing in Python. It provides support for large, multi-dimensional arrays and matrices, along with a vast collection of mathematical functions to operate on these arrays. The code block below provide a number of examples of basic numpy operations. I don't expect you to memorize these or anything but it would be good for you to be at least vaguely familiar with both the syntax and what's possible.
#always import it as np
import numpy as np
# Create a 1D array
= np.array([1, 2, 3, 4])
array_1d print("1D Array:", array_1d)
# Create a 2D array
= np.array([[1, 2, 3], [4, 5, 6]])
array_2d print("\n2D Array:\n", array_2d)
# Create arrays filled with zeros or ones
= np.zeros((2, 3))
zeros = np.ones((2, 3))
ones print("\nZeros:\n", zeros)
print("\nOnes:\n", ones)
# Create an array with a range of numbers
= np.arange(0, 10, 2) # start, stop, step
range_array print("\nRange Array:", range_array)
# Create an array with evenly spaced values
= np.linspace(0, 1, 5) # start, stop, number of points
linspace_array print("\nLinspace Array:", linspace_array)
= np.array([1, 2, 3])
array_a = np.array([4, 5, 6])
array_b
# Element-wise addition
print("Addition:", array_a + array_b)
# Element-wise multiplication
print("Multiplication:", array_a * array_b)
# Broadcasting: Adding a scalar to an array
print("Add scalar:", array_a + 10)
# Element-wise square
print("Square:", array_a ** 2)
= np.array([10, 20, 30, 40, 50])
array
# Indexing
print("First element:", array[0])
# Slicing
print("Slice (1:4):", array[1:4])
# Modifying elements
0] = 99
array[print("Modified Array:", array)
= np.arange(1, 10) # Array with values 1 to 9
array = array.reshape((3, 3)) # Reshape to 3x3
reshaped print("Original Array:", array)
print("\nReshaped Array:\n", reshaped)
= np.array([1, 2, 3, 4, 5])
array
# Sum of elements
print("Sum:", np.sum(array))
# Mean and standard deviation
print("Mean:", np.mean(array))
print("Standard Deviation:", np.std(array))
# Maximum and minimum
print("Max:", np.max(array))
print("Min:", np.min(array))
# Random numbers between 0 and 1
= np.random.rand(3, 3)
random_array print("Random Array:\n", random_array)
# Random integers
= np.random.randint(0, 10, (2, 3)) # range [0, 10), shape (2, 3)
random_ints print("\nRandom Integers:\n", random_ints)
Numpy is significantly faster than basic python for mathematical, matrix, and vector operations
import numpy as np
import time
# Generate a large list and a NumPy array with the same values
= 10**6 # 1 million elements
size = list(range(size))
python_list = np.arange(size)
numpy_array
# Pure Python: Compute the square of each element
= time.time()
start_time = [x ** 2 for x in python_list]
python_result = time.time() - start_time
python_time print(f"Pure Python took {python_time:.5f} seconds")
# NumPy: Compute the square of each element
= time.time()
start_time = numpy_array ** 2
numpy_result = time.time() - start_time
numpy_time print(f"NumPy took {numpy_time:.5f} seconds")
# Print the speedup
= python_time / numpy_time
speedup print(f"NumPy is approximately {speedup:.2f}x faster!")
Pandas provides high-performance, easy-to-use data structures and tools for working with structured data. It's particularly good at handling tabular data with heterogeneously-typed columns. It's sort of like a super advanced dictionary crossed with a spread sheet
import pandas as pd
# Create a DataFrame from a dictionary
= {
data "Name": ["Alice", "Bob", "Charlie", "David"],
"Age": [25, 30, 35, 40],
"Score": [85, 90, 95, 100],
}
= pd.DataFrame(data)
df print("DataFrame:\n", df)
# Access a single column
print("Names:\n", df["Name"])
# Add a new column
"Pass"] = df["Score"] >= 90
df[print("\nUpdated DataFrame:\n", df)
# Filter rows where 'Age' is greater than 30
= df[df["Age"] > 30]
filtered_df print("\nFiltered Rows:\n", filtered_df)
# Using a for loop to calculate grades
= []
grades for score in df["Score"]:
if score >= 90:
"A")
grades.append(else:
"B")
grades.append("Grade"] = grades
df[print("Grades with loop:\n", df)
# Vectorized operation using pandas `apply` and lambda
"Grade"] = df["Score"].apply(lambda x: "A" if x >= 90 else "B")
df[print("\nGrades with Pandas:\n", df)
# Group data by 'Grade' and calculate average age
= df.groupby("Grade")["Age"].mean()
grouped print("\nAverage Age by Grade:\n", grouped)
import pandas as pd
import numpy as np
import time
# Create a large DataFrame
= 10**6
size = pd.DataFrame({
data "A": np.random.rand(size),
"B": np.random.rand(size),
})
# Python loop: Add two columns
= time.time()
start_time "C_loop"] = [data["A"][i] + data["B"][i] for i in range(size)]
data[= time.time() - start_time
python_time print(f"Python loop took {python_time:.5f} seconds")
# Pandas vectorized operation
= time.time()
start_time "C_vectorized"] = data["A"] + data["B"]
data[= time.time() - start_time
pandas_time print(f"Pandas vectorized operation took {pandas_time:.5f} seconds")
# Speedup
= python_time / pandas_time
speedup print(f"Pandas is approximately {speedup:.2f}x faster!")
Scikit-learn is the most popular machine learning library in Python. It provides a consistent interface for a wide range of machine learning algorithms. We'll spend a lot more time looking at and using sklearn during Unit 02 but below is an example of using a model on the iris dataset to predict what type of iris a new flower is
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score, classification_report
# Load the Iris dataset
= load_iris()
iris = iris.data # Features
X = iris.target # Labels
y
# Split the data into training and testing sets
= train_test_split(X, y, test_size=0.2, random_state=42)
X_train, X_test, y_train, y_test
# Initialize the Gaussian Naive Bayes model
= GaussianNB()
nb_model
# Train the model on the training data
nb_model.fit(X_train, y_train)
# Make predictions on the test data
= nb_model.predict(X_test)
y_pred
# Evaluate the model
= accuracy_score(y_test, y_pred)
accuracy print(f"Accuracy: {accuracy:.2f}")
# Print detailed classification report
print("\nClassification Report:")
print(classification_report(y_test, y_pred))
Type your name here: