The police need some help with those letters Detective Caulfield delivered during class. The letters appear to be newspaper cutouts all combined together into a single letter. The Police don't want to do all the manual work of converting the letters into a text format and so they've asked your your help!
You can access the letters in the following google drive folder: Homework03 Data
As you try to fight off the spring break hangover, you realize you can treat this as an image segmentation task. We need to figure out where in the entire image each letter is. After hairing the dog, you have another breakthrough idea, what if you just perform a clustering task with only two clusters, foreground and background? Using the clustering methods we saw in class, you decide to give it a shot!
##### IMPORTS #####
import cv2
import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from sklearn.cluster import KMeans, AgglomerativeClustering
from sklearn.mixture import GaussianMixture
from sklearn.svm import SVC
# Set random seed for NumPy’s random number generator
# Conceptually: makes random operations (e.g., random initialization in KMeans, shuffling data, or selecting random samples) produce the same results every time the code is run
# aka your code becomes reporducable
42) np.random.seed(
We can use the imread function from the cv2 library to read our images in. We should end up with the actual images shape being (595, 420, 3) and the mask shape being (595, 420)
# TODO: Define paths (modify these according to your actual data paths)
= 'homework03/note_page_1.png'
image_1_path = 'homework03/mask_page_1.png'
mask_1_path
= 'homework03/note_page_2.png'
image_2_path = 'homework03/mask_page_2.png'
mask_2_path
= 'homework03/note_page_3.png'
image_3_path = 'homework03/mask_page_3.png'
mask_3_path
= 'homework03/note_page_4.png'
image_4_path = 'homework03/mask_page_4.png'
mask_4_path
= cv2.imread(image_1_path)
image_1 = cv2.cvtColor(image_1, cv2.COLOR_BGR2RGB)
image_1 = cv2.imread(mask_1_path, cv2.IMREAD_GRAYSCALE)
mask_1
= cv2.imread(image_2_path)
image_2 = cv2.cvtColor(image_2, cv2.COLOR_BGR2RGB)
image_2 = cv2.imread(mask_2_path, cv2.IMREAD_GRAYSCALE)
mask_2
= cv2.imread(image_3_path)
image_3 = cv2.cvtColor(image_3, cv2.COLOR_BGR2RGB)
image_3 = cv2.imread(mask_3_path, cv2.IMREAD_GRAYSCALE)
mask_3
= cv2.imread(image_4_path)
image_4 = cv2.cvtColor(image_4, cv2.COLOR_BGR2RGB)
image_4 = cv2.imread(mask_4_path, cv2.IMREAD_GRAYSCALE) mask_4
You remember there were a number of clustering methods in class, but you also remember that typically k-means clustering was the one used as a sort of litmus test, so you figure that may as well be the place to start
# -------------------------------------
# Task 1: Perform Kmeans Segmentation
# -------------------------------------
def kmeans_segmentation(image, pca_components=None):
"""
Perform K-Means segmentation on an image
Args:
image: RGB image
pca_components: Number of PCA components to use (None for no PCA)
Returns:
segmented_image: Image with pixel values replaced by cluster labels
"""
# Reshape the image into a 2D array of pixels and 3 color values (RGB)
= image.reshape(-1, 3)
pixels
# Apply PCA if specified
if pca_components is not None:
# TODO: Standardize pixel values to have zero mean and unit variance
# TODO: Apply PCA to reduce dimensionality
# TODO: Assign PCA-transformed data to X
= ...
X else:
# If no PCA wanted, just set X = to the reshaped image
= ...
X
# TODO: Perform K-Means clustering, fit Kmeans to data to get cluster labels for each pixel
# hint, we want to cluster the foreground and the background, so how many clusers should you use?
# Reshape the labels back to the image shape
= labels.reshape(image.shape[:2])
segmented_image
return segmented_image
This will likely crash your kernel, it's a pedagogical tool.
# -------------------------------------
# Task 2: Perform Agglomerative Segmentation
# -------------------------------------
def agglomerative_segmentation(image, pca_components=None):
"""
Perform Agglomerative Clustering on an image
Args:
image: RGB image
pca_components: Number of PCA components to use (None for no PCA)
Returns:
segmented_image: Image with pixel values replaced by cluster labels
"""
# Reshape the image into a 2D array of pixels and 3 color values (RGB)
= image.reshape(-1, 3)
pixels
# Apply PCA if specified
if pca_components is not None:
# TODO: Standardize pixel values to have zero mean and unit variance
# TODO: Apply PCA to reduce dimensionality
# TODO: Assign PCA-transformed data to X
= ...
X else:
= ...
X
# TODO: Apply Agglomerative Clustering
# Reshape the labels back to the image shape
= labels.reshape(image.shape[:2])
segmented_image
return segmented_image
# -------------------------------------
# Task 3: Perform Gaussian Mixture Model Segmentation
# -------------------------------------
def gmm_segmentation(image, pca_components=None):
"""
Perform Gaussian Mixture Model segmentation on an image
Args:
image: RGB image
pca_components: Number of PCA components to use (None for no PCA)
Returns:
segmented_image: Image with pixel values replaced by component labels
"""
# Reshape the image into a 2D array of pixels and 3 color values (RGB)
= image.reshape(-1, 3)
pixels
# Apply PCA if specified
if pca_components is not None:
# TODO: Standardize pixel values to have zero mean and unit variance
# TODO: Apply PCA to reduce dimensionality
# TODO: Assign PCA-transformed data to X
= ...
X else:
= ...
X
# TODO: Apply GMM
# Reshape the labels back to the image shape
= labels.reshape(image.shape[:2])
segmented_image
return segmented_image
We'll need to evaluate how well our segmentation methods are doing. We can use the intersection over union (IoU) metric to evaluate how well our segmentation methods are doing.
Intersection over Union (IoU) is defined as the intersection of the predicted mask and the ground truth mask divided by the union of the predicted mask and the ground truth mask.
$$IoU = \frac{TP}{TP + FP + FN}$$
Where TP is the number of true positives, FP is the number of false positives, and FN is the number of false negatives.
Really all this boils down to is looking at how close our predicted labels, if we made every background cluster pixel black, and every foreground cluster pixel 1, is to our ground-truth mask
In order to actually assess how well your models did, you of course need something to compare against, so the police manually annotated 4 letters and created masks for them
# -------------------------------------
# Task 6: Segmentation Metrics
# -------------------------------------
def calculate_segmentation_metrics(pred, truth, invert=False, visualize=False):
"""
Calculate Intersection over Union (IoU) to measure segmentation accuracy
Args:
pred: Binary prediction mask (595, 420)
truth: Binary ground truth mask (595, 420)
Returns:
iou: Intersection over Union (IoU) metric
"""
if visualize:
# Visualize the masks
=(10, 5))
plt.figure(figsize
1, 2, 1)
plt.subplot('Prediction Mask')
plt.title(='gray')
plt.imshow(pred, cmap'off')
plt.axis(
1, 2, 2)
plt.subplot('Ground Truth Mask')
plt.title(='gray')
plt.imshow(truth, cmap'off')
plt.axis(
plt.show()
# TODO: Convert pred to boolean mask
# TODO: If invert is true, then flip foreground and background labels
if invert:
pass
# TODO: Convert truth to boolean mask
# TODO: Calculate intersection and union
# TODO: calculate IoU (Intersection over Union)
# hint: this is also known as the Jaccard index
= ...
iou print('IoU:', iou)
return iou
We should probably explore how well each of our unsupervised segmentation methods worked, with different numbers of PCA components.
print("\nPerforming Unsupervised Segmentation...")
for pca_components in [None, 1, 2]:
# TODO: Run Kmeans segmentation, evaluate how well it did
print(f"Running KMeans with PCA components: {pca_components}")
= ...
segmented_image_kmeans
# TODO Run GMM, evaluate how well it did
print(f"Running Gaussian Mixture Model with PCA components: {pca_components}")
= ... segmented_image_gmm
Performing Unsupervised Segmentation...
Running KMeans with PCA components: None
IoU: 0.8150511446663399
Running Gaussian Mixture Model with PCA components: None
IoU: 0.8456018086408212
Running KMeans with PCA components: 1
IoU: 0.7816316084529681
Running Gaussian Mixture Model with PCA components: 1
IoU: 0.8448758633563543
Running KMeans with PCA components: 2
IoU: 0.7847515051063914
Running Gaussian Mixture Model with PCA components: 2
IoU: 0.8456018086408212
As we talked about in class, kmeans can be non-deterministic so please don't fret if your numbers are a little different
The unsupervised segmentation seems to work pretty well, but you wonder if a supervised method might work even better. We could probably set this up in a similar way to the clustering task, but instead of using two clusters (foreground and background), we can have two classes (foreground and background).
Luckily the police have manually created a binary mask for us that we can use as labelled data to train our supervised model.
Unluckily, we need to somehow let each pixel know about the surrounding pixels. So we'll need to somehow create a feature vector for each pixel that contains information about the pixel and the surrounding pixels.
Maybe we can take a patch around each pixel and use the RGB values of the pixels in the patch as features?
# -------------------------------------
# Task 4: Write Extract Features Function
# -------------------------------------
def extract_features(image, patch_size=5):
"""
Extract features for each pixel using RGB values and local neighborhood.
Args:
image: RGB image (595, 420, 3)
patch_size: Size of the local neighborhood patch (must be odd)
Returns:
features: Array of shape (n_pixels, n_features) (249900, 78) (595 x 420, 5 x 5 x 3 + 1 x 3)
"""
= ...
height, width
# TODO: Add padding to image
= ...
padding
= cv2.copyMakeBorder(
padded
image,
padding, padding, padding, padding,
cv2.BORDER_REFLECT
)
# TODO: Calculate total number of pixels
= ...
n_pixels
# TODO: Calculate total number of features per pixel
= ...
n_features
# TODO: Initialize data structue to store features for all pixels
= ...
features
# TODO: Loop through pixels and extracts features for each pixel
# hint: the feature vector will be all RGB values for every surrounding pixel concatenated with the pixel itself
# hint: this results in a 78 dimensional feature vector, for each pixel!
# hint: 5 x 5 path = 25, 3 colors per pixel gives 3 x 25 = 75 + 3 for the pixel itself again gives 78
# hint: use current pixel RGB values and the surrounding local patch
# Return feature matrix
return features
# -------------------------------------
# Task 5: SVM Segmentation
# -------------------------------------
def svm_segmentation(train_image, train_mask, test_image, kernel='rbf', pca_components=10):
# TODO: Extract features of training image
= ...
X = (train_mask > 0).reshape(-1)
y
# TODO: Scale features
# TODO: Apply PCA
# TODO: Train SVM
# TODO: Extract features of testing image
# TODO: Apply transformations to testing image
# hint: transformations means scaling and dimensionality reduction
# TODO: Predict on test image
# Reshape predictions to match image matrix shape
= predictions.reshape(test_image.shape[:2])
segmented_image
return segmented_image
Now we can try our supervised segmentation method (SVM) with different kernels and different numbers of PCA components.
This will take a while to run, so be patient!
I encourage you to try different amounts for PCA to see if you can beat my results, but the three I tested were 1, 5, and 10
print("\nPerforming Supervised Segmentation...")
for kernel in ['linear', 'poly', 'rbf']: # Conceptual question: what do 'linear', 'poly' and 'rbf' stand for? Why are we trying each of them as our kernel variable?
for pca_components in [1, 5, 10]:
# TODO: SVM segmentation and evaluation
print(f"Running SVM with kernel: {kernel} and PCA components: {pca_components}")
= ... segmented_image_svm
Performing Supervised Segmentation...
Running SVM with kernel: linear and PCA components: 1
IoU: 0.7939733382706887
Running SVM with kernel: linear and PCA components: 5
IoU: 0.7906213147732772
Running SVM with kernel: linear and PCA components: 10
IoU: 0.8016820011500843
Running SVM with kernel: poly and PCA components: 1
IoU: 0.7570233911041068
Running SVM with kernel: poly and PCA components: 5
IoU: 0.8301359011807786
Running SVM with kernel: poly and PCA components: 10
IoU: 0.8775227199564151
Running SVM with kernel: rbf and PCA components: 1
IoU: 0.813663710080135
Running SVM with kernel: rbf and PCA components: 5
IoU: 0.914427174421753
Running SVM with kernel: rbf and PCA components: 10
IoU: 0.9516093009405142
This computer vision stuff is outside the scope of this course, so the code is just provided for you. But this is how we can extract the region of each letter using our segmentation!
from skimage.measure import label, regionprops
import os
def extract_and_save_letters(segmented_image, original_image, output_dir='letters'):
"""
Extract each segmented letter and save as a separate PNG file.
Args:
segmented_image: Segmented image with unique labels for each letter
original_image: Original RGB image
output_dir: Directory to save the extracted letter images
"""
# Ensure the output directory exists
=True)
os.makedirs(output_dir, exist_ok
# Label connected components
= label(segmented_image)
labeled_image
# Iterate over each labeled region
for region in regionprops(labeled_image):
# Extract the bounding box of the region
= region.bbox
min_row, min_col, max_row, max_col
# Extract the region from the original image
= original_image[min_row:max_row, min_col:max_col]
letter_image
# Save the extracted letter as a PNG file
= os.path.join(output_dir, f'letter_{region.label}.png')
letter_filename cv2.imwrite(letter_filename, cv2.cvtColor(letter_image, cv2.COLOR_RGB2BGR))
Lets actually use our best combination to extract each letter as a png!
= svm_segmentation(train_image, train_mask, test_image, kernel='rbf', pca_components=10)
segmented_image extract_and_save_letters(segmented_image, test_image)
Now that we have all of the letters extracted, we can perform Optical Character Recognition (OCR) on each letter to extract the text. The next homework we'll explore how to do this with a basic Feed-Forward Neural Network (which you'll be writing from scratch) and a Convolutional Neural Network (which you'll be using from PyTorch).