Normalization Techniques Visualizer | Exploring Artificial Intelligence

[N=2, C=4, S=4]

Distribution: Raw vs Normalized

What is Normalization?

Normalization rescales activations inside a neural network so they have zero mean and unit variance. This stabilizes training, allows higher learning rates, and reduces sensitivity to weight initialization.

Why So Many Types?

Different techniques normalize over different dimensions of the tensor. The choice affects what statistics are shared across batch items, channels, and spatial positions — which matters for CNNs, Transformers, style transfer, and small-batch training.

How to Use

Click a norm type (BN, LN, IN, GN, RMS) to switch techniques
Hover a cell to highlight all cells in the same normalization group
Click a cell to lock selection and inspect the math
Adjust dimensions in Controls to reshape the tensor
Check Math tab to see the live mean/variance computation

General Formula

y = γ · (x − μ) / √(σ² + ε) + β

RMSNorm omits mean centering: y = γ · x / √(mean(x²) + ε)

Reduction Axes per Type

Type	Batch (N)	Channel (C)	Spatial (S)
BatchNorm	✓	—	✓
LayerNorm	—	✓	✓
InstanceNorm	—	—	✓
GroupNorm	—	G	✓
RMSNorm	—	✓	✓

Key Differences

BatchNorm computes stats across the batch — depends on batch size
LayerNorm normalizes per-sample — ideal for Transformers & RNNs
InstanceNorm normalizes per-channel per-sample — used in style transfer
GroupNorm splits channels into groups — works with any batch size
RMSNorm skips mean centering — faster, used in LLaMA/modern LLMs

Live Computation

Hover or click a cell to see the computation breakdown.

Batch (N): 2

Channels (C): 4

Spatial (S): 4

Groups (G): 2

Epsilon (ε): 1e-5

Noise Level: 1.5

Apply γ / β (affine)

E-AI: Normalization Techniques Visualizer