Homework 3: Image Segmentation

  1. Given a data set of unlabeled ransom letter images
  2. KMeans Clustering Segmentation with PCA
  3. Agglomerative Clustering Segmentation with PCA
  4. Gaussian Mixture Model Clustering with PCA
  5. Train a Support Vector Classifier with different kernels and PCA
  6. Evaluate the models
  7. Use the models on the images to create a png of each letter

Homework 3: Encoded Letters

The police need some help with those letters Detective Caulfield delivered during class. The letters appear to be newspaper cutouts all combined together into a single letter. The Police don't want to do all the manual work of converting the letters into a text format and so they've asked your your help!

You can access the letters in the following google drive folder: Homework03 Data

As you try to fight off the spring break hangover, you realize you can treat this as an image segmentation task. We need to figure out where in the entire image each letter is. After hairing the dog, you have another breakthrough idea, what if you just perform a clustering task with only two clusters, foreground and background? Using the clustering methods we saw in class, you decide to give it a shot!

##### IMPORTS #####
import cv2

import numpy as np
import matplotlib.pyplot as plt

from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from sklearn.cluster import KMeans, AgglomerativeClustering
from sklearn.mixture import GaussianMixture
from sklearn.svm import SVC

# Set random seed for NumPy’s random number generator
#     Conceptually: makes random operations (e.g., random initialization in KMeans, shuffling data, or selecting random samples) produce the same results every time the code is run
#     aka your code becomes reporducable
np.random.seed(42)

Loading Data

We can use the imread function from the cv2 library to read our images in. We should end up with the actual images shape being (595, 420, 3) and the mask shape being (595, 420)

# TODO: Define paths (modify these according to your actual data paths)
image_1_path = 'homework03/note_page_1.png'
mask_1_path = 'homework03/mask_page_1.png'

image_2_path = 'homework03/note_page_2.png'
mask_2_path = 'homework03/mask_page_2.png'

image_3_path = 'homework03/note_page_3.png'
mask_3_path = 'homework03/mask_page_3.png'

image_4_path = 'homework03/note_page_4.png'
mask_4_path = 'homework03/mask_page_4.png'


image_1 = cv2.imread(image_1_path)
image_1 = cv2.cvtColor(image_1, cv2.COLOR_BGR2RGB)
mask_1 = cv2.imread(mask_1_path, cv2.IMREAD_GRAYSCALE)

image_2 = cv2.imread(image_2_path)
image_2 = cv2.cvtColor(image_2, cv2.COLOR_BGR2RGB)
mask_2 = cv2.imread(mask_2_path, cv2.IMREAD_GRAYSCALE)

image_3 = cv2.imread(image_3_path)
image_3 = cv2.cvtColor(image_3, cv2.COLOR_BGR2RGB)
mask_3 = cv2.imread(mask_3_path, cv2.IMREAD_GRAYSCALE)

image_4 = cv2.imread(image_4_path)
image_4 = cv2.cvtColor(image_4, cv2.COLOR_BGR2RGB)
mask_4 = cv2.imread(mask_4_path, cv2.IMREAD_GRAYSCALE)

Unsupervised Learning Image Segmentation

You remember there were a number of clustering methods in class, but you also remember that typically k-means clustering was the one used as a sort of litmus test, so you figure that may as well be the place to start

# -------------------------------------
# Task 1: Perform Kmeans Segmentation
# -------------------------------------
def kmeans_segmentation(image, pca_components=None):
    """
    Perform K-Means segmentation on an image

    Args:
        image: RGB image
        pca_components: Number of PCA components to use (None for no PCA)

    Returns:
        segmented_image: Image with pixel values replaced by cluster labels
    """
    # Reshape the image into a 2D array of pixels and 3 color values (RGB)
    pixels = image.reshape(-1, 3)

    # Apply PCA if specified
    if pca_components is not None:
        # TODO: Standardize pixel values to have zero mean and unit variance


        # TODO: Apply PCA to reduce dimensionality


        # TODO: Assign PCA-transformed data to X
        X = ...
    else:
        # If no PCA wanted, just set X = to the reshaped image
        X = ...

    # TODO: Perform K-Means clustering, fit Kmeans to data to get cluster labels for each pixel
    #    hint, we want to cluster the foreground and the background, so how many clusers should you use?

    # Reshape the labels back to the image shape
    segmented_image = labels.reshape(image.shape[:2])

    return segmented_image

Agglomerative Clustering

This will likely crash your kernel, it's a pedagogical tool.

# -------------------------------------
# Task 2: Perform Agglomerative Segmentation
# -------------------------------------
def agglomerative_segmentation(image, pca_components=None):
    """
    Perform Agglomerative Clustering on an image

    Args:
        image: RGB image
        pca_components: Number of PCA components to use (None for no PCA)

    Returns:
        segmented_image: Image with pixel values replaced by cluster labels
    """

    # Reshape the image into a 2D array of pixels and 3 color values (RGB)
    pixels = image.reshape(-1, 3)

    # Apply PCA if specified
    if pca_components is not None:
        # TODO: Standardize pixel values to have zero mean and unit variance


        # TODO: Apply PCA to reduce dimensionality


        # TODO: Assign PCA-transformed data to X
        X = ...
    else:
        X = ...

    # TODO: Apply Agglomerative Clustering


    # Reshape the labels back to the image shape
    segmented_image = labels.reshape(image.shape[:2])

    return segmented_image
# -------------------------------------
# Task 3: Perform Gaussian Mixture Model Segmentation
# -------------------------------------
def gmm_segmentation(image, pca_components=None):
    """
    Perform Gaussian Mixture Model segmentation on an image

    Args:
        image: RGB image
        pca_components: Number of PCA components to use (None for no PCA)

    Returns:
        segmented_image: Image with pixel values replaced by component labels
    """

    # Reshape the image into a 2D array of pixels and 3 color values (RGB)
    pixels = image.reshape(-1, 3)

    # Apply PCA if specified
    if pca_components is not None:
        # TODO: Standardize pixel values to have zero mean and unit variance


        # TODO: Apply PCA to reduce dimensionality


        # TODO: Assign PCA-transformed data to X
        X = ...
    else:
        X = ...

    # TODO: Apply GMM


    # Reshape the labels back to the image shape
    segmented_image = labels.reshape(image.shape[:2])

    return segmented_image

Evaluating Clustering Segmentations

We'll need to evaluate how well our segmentation methods are doing. We can use the intersection over union (IoU) metric to evaluate how well our segmentation methods are doing.

Intersection over Union (IoU) is defined as the intersection of the predicted mask and the ground truth mask divided by the union of the predicted mask and the ground truth mask.

$$IoU = \frac{TP}{TP + FP + FN}$$

Where TP is the number of true positives, FP is the number of false positives, and FN is the number of false negatives.

Really all this boils down to is looking at how close our predicted labels, if we made every background cluster pixel black, and every foreground cluster pixel 1, is to our ground-truth mask

In order to actually assess how well your models did, you of course need something to compare against, so the police manually annotated 4 letters and created masks for them

# -------------------------------------
# Task 6: Segmentation Metrics
# -------------------------------------
def calculate_segmentation_metrics(pred, truth, invert=False, visualize=False):
    """
    Calculate Intersection over Union (IoU) to measure segmentation accuracy

    Args:
        pred: Binary prediction mask (595, 420)
        truth: Binary ground truth mask (595, 420)

    Returns:
        iou: Intersection over Union (IoU) metric
    """
    if visualize:
      # Visualize the masks
      plt.figure(figsize=(10, 5))
      
      plt.subplot(1, 2, 1)
      plt.title('Prediction Mask')
      plt.imshow(pred, cmap='gray')
      plt.axis('off')
      
      plt.subplot(1, 2, 2)
      plt.title('Ground Truth Mask')
      plt.imshow(truth, cmap='gray')
      plt.axis('off')
      
      plt.show()


    # TODO: Convert pred to boolean mask


    # TODO: If invert is true, then flip foreground and background labels
    if invert:
      pass

    # TODO: Convert truth to boolean mask


    # TODO: Calculate intersection and union


    # TODO: calculate IoU (Intersection over Union)
    #    hint: this is also known as the Jaccard index
    iou = ...
    print('IoU:', iou)

    return iou

Unsupervised Segmentation

We should probably explore how well each of our unsupervised segmentation methods worked, with different numbers of PCA components.

print("\nPerforming Unsupervised Segmentation...")
for pca_components in [None, 1, 2]:
    # TODO: Run Kmeans segmentation, evaluate how well it did
    print(f"Running KMeans with PCA components: {pca_components}")
    segmented_image_kmeans = ...


    # TODO Run GMM, evaluate how well it did
    print(f"Running Gaussian Mixture Model with PCA components: {pca_components}")
    segmented_image_gmm = ...

Expected Output

Performing Unsupervised Segmentation...
Running KMeans with PCA components: None
IoU: 0.8150511446663399
Running Gaussian Mixture Model with PCA components: None
IoU: 0.8456018086408212
Running KMeans with PCA components: 1
IoU: 0.7816316084529681
Running Gaussian Mixture Model with PCA components: 1
IoU: 0.8448758633563543
Running KMeans with PCA components: 2
IoU: 0.7847515051063914
Running Gaussian Mixture Model with PCA components: 2
IoU: 0.8456018086408212

As we talked about in class, kmeans can be non-deterministic so please don't fret if your numbers are a little different

Supervised Segmentation

The unsupervised segmentation seems to work pretty well, but you wonder if a supervised method might work even better. We could probably set this up in a similar way to the clustering task, but instead of using two clusters (foreground and background), we can have two classes (foreground and background).

Luckily the police have manually created a binary mask for us that we can use as labelled data to train our supervised model.

Unluckily, we need to somehow let each pixel know about the surrounding pixels. So we'll need to somehow create a feature vector for each pixel that contains information about the pixel and the surrounding pixels.

Maybe we can take a patch around each pixel and use the RGB values of the pixels in the patch as features?

# -------------------------------------
# Task 4: Write Extract Features Function
# -------------------------------------
def extract_features(image, patch_size=5):
    """
    Extract features for each pixel using RGB values and local neighborhood.

    Args:
        image: RGB image (595, 420, 3)
        patch_size: Size of the local neighborhood patch (must be odd)

    Returns:
        features: Array of shape (n_pixels, n_features) (249900, 78) (595 x 420, 5 x 5 x 3 + 1 x 3)
    """
    height, width = ...

    # TODO: Add padding to image
    padding = ...

    padded = cv2.copyMakeBorder(
        image,
        padding, padding, padding, padding,
        cv2.BORDER_REFLECT
    )


    # TODO: Calculate total number of pixels
    n_pixels = ...

    # TODO: Calculate total number of features per pixel
    n_features = ...

    # TODO: Initialize data structue to store features for all pixels
    features = ...

    # TODO: Loop through pixels and extracts features for each pixel
    #     hint: the feature vector will be all RGB values for every surrounding pixel concatenated with the pixel itself
    #     hint: this results in a 78 dimensional feature vector, for each pixel!
    #     hint: 5 x 5 path = 25, 3 colors per pixel gives 3 x 25 = 75 + 3 for the pixel itself again gives 78
    #     hint: use current pixel RGB values and the surrounding local patch


    # Return feature matrix 
    return features
# -------------------------------------
# Task 5: SVM Segmentation
# -------------------------------------
def svm_segmentation(train_image, train_mask, test_image, kernel='rbf', pca_components=10):
    # TODO: Extract features of training image
    X = ...
    y = (train_mask > 0).reshape(-1)

    # TODO: Scale features


    # TODO: Apply PCA


    # TODO: Train SVM


    # TODO: Extract features of testing image


    # TODO: Apply transformations to testing image
    #     hint: transformations means scaling and dimensionality reduction


    # TODO: Predict on test image


    # Reshape predictions to match image matrix shape
    segmented_image = predictions.reshape(test_image.shape[:2])

    return segmented_image

Supervised Segmentation

Now we can try our supervised segmentation method (SVM) with different kernels and different numbers of PCA components.

This will take a while to run, so be patient!

I encourage you to try different amounts for PCA to see if you can beat my results, but the three I tested were 1, 5, and 10

print("\nPerforming Supervised Segmentation...")
for kernel in ['linear', 'poly', 'rbf']:       # Conceptual question: what do 'linear', 'poly' and 'rbf' stand for? Why are we trying each of them as our kernel variable?
    for pca_components in [1, 5, 10]:
        # TODO: SVM segmentation and evaluation
        print(f"Running SVM with kernel: {kernel} and PCA components: {pca_components}")
        segmented_image_svm = ...

Expected Output

Performing Supervised Segmentation...
Running SVM with kernel: linear and PCA components: 1
IoU: 0.7939733382706887
Running SVM with kernel: linear and PCA components: 5
IoU: 0.7906213147732772
Running SVM with kernel: linear and PCA components: 10
IoU: 0.8016820011500843
Running SVM with kernel: poly and PCA components: 1
IoU: 0.7570233911041068
Running SVM with kernel: poly and PCA components: 5
IoU: 0.8301359011807786
Running SVM with kernel: poly and PCA components: 10
IoU: 0.8775227199564151
Running SVM with kernel: rbf and PCA components: 1
IoU: 0.813663710080135
Running SVM with kernel: rbf and PCA components: 5
IoU: 0.914427174421753
Running SVM with kernel: rbf and PCA components: 10
IoU: 0.9516093009405142

Region Extraction

This computer vision stuff is outside the scope of this course, so the code is just provided for you. But this is how we can extract the region of each letter using our segmentation!

from skimage.measure import label, regionprops
import os

def extract_and_save_letters(segmented_image, original_image, output_dir='letters'):
    """
    Extract each segmented letter and save as a separate PNG file.
    
    Args:
        segmented_image: Segmented image with unique labels for each letter
        original_image: Original RGB image
        output_dir: Directory to save the extracted letter images
    """
    # Ensure the output directory exists
    os.makedirs(output_dir, exist_ok=True)
    
    # Label connected components
    labeled_image = label(segmented_image)
    
    # Iterate over each labeled region
    for region in regionprops(labeled_image):
        # Extract the bounding box of the region
        min_row, min_col, max_row, max_col = region.bbox
        
        # Extract the region from the original image
        letter_image = original_image[min_row:max_row, min_col:max_col]
        
        # Save the extracted letter as a PNG file
        letter_filename = os.path.join(output_dir, f'letter_{region.label}.png')
        cv2.imwrite(letter_filename, cv2.cvtColor(letter_image, cv2.COLOR_RGB2BGR)) 

Wrapping up

Lets actually use our best combination to extract each letter as a png!

segmented_image = svm_segmentation(train_image, train_mask, test_image, kernel='rbf', pca_components=10)
extract_and_save_letters(segmented_image, test_image)

Up Next

Now that we have all of the letters extracted, we can perform Optical Character Recognition (OCR) on each letter to extract the text. The next homework we'll explore how to do this with a basic Feed-Forward Neural Network (which you'll be writing from scratch) and a Convolutional Neural Network (which you'll be using from PyTorch).