Jake Spencer Walklate [svnty]

The Problem

Accurate identification of blood cell types is critical for medical diagnostics and treatment planning. Traditional manual classification via microscopy is labour intensive, can be subjective, and is susceptible to errors.

This project investigates applying a Multi-Layer Perceptron to automate blood cell classification, highlighting the potential of neural networks to enhance clinical decision-making with faster, more consistent outcomes.

How It Works

🔬 Data Processing

The Blood Cell Detection Dataset contains 100 high-resolution microscopic images with 2,340 annotated cell regions. Each cell is cropped, resized to 64×64 pixels, normalized to [0,1], and one-hot encoded for classification.

🧠 MLP Architecture

A fully connected neural network with an input layer that flattens image data, two dense hidden layers with ReLU activation and dropout regularization, and a softmax output layer for multi-class classification.

⚡ Forward Pass

Each neuron computes a weighted sum of inputs, adds a bias, and applies ReLU activation:a = max(0, Σwᵢxᵢ + b). This nonlinearity allows the network to learn complex relationships in the data.

🔄 Backpropagation

Differentiating cross-entropy loss with softmax yields:∂L/∂W = (p - y)x. The gradient is the outer product of the error (predicted minus true) and the previous layer's activations, propagated backward to update each weight.

📊 Softmax Output

The final layer transforms raw logits into a probability distribution using softmax:pᵢ = eᶻⁱ / Σeᶻʲ. The class with highest probability is selected as the prediction.

📉 Loss Function

Categorical cross-entropy measures the difference between predicted probabilities and true labels: L = -Σyᵢlog(pᵢ). Minimizing this encourages high probability on correct classes.

PAC-Learnability Analysis

To verify the model is Probably Approximately Correct (PAC)learnable, we calculated the minimum sample size required for given error and failure bounds:

ε (error margin): 0.05
δ (failure probability): 0.05
Required samples: n ≥ 738
Actual training samples: 1,882
Empirical error rate: 0.0044 ✓

The training set exceeds the PAC requirement, and the empirical error (0.44%) is well below the epsilon threshold (5%), confirming successful learning.

Results

99.56%

Validation Accuracy

0.0413

Validation Loss

6.4M

Parameters

Epochs to Converge

Technical Stack

TensorFlow + Keras: Deep learning framework for building and training the MLP architecture.
OpenCV: Image processing for cropping and resizing annotated cell regions from microscopy images.
Scikit-learn: Data splitting, preprocessing, and evaluation metrics.
Pandas + NumPy: Data manipulation and numerical operations for annotation processing.
Matplotlib: Visualization of training curves, confusion matrices, and sample predictions.

Future Work

Future iterations should classify white blood cells into specific subtypes (granulocytes, lymphocytes, etc.). Current limitations include the relatively small dataset, lack of fine-grained WBC labels, and controlled image conditions. With 196,608 input neurons (256×256×3) and up to 6.4 million parameters, the architecture could be optimized with convolutional layers for better spatial feature extraction.