L1/L2 Regularization Visualizer | Exploring Artificial Intelligence

λ = 1.00

Loss Penalty Total

What is Regularization?

Regularization adds a penalty term to the loss function that discourages large weights. This prevents overfitting by constraining model complexity.

L1 vs L2: The Key Insight

L1 (Lasso) adds |w₀| + |w₁| — the constraint region is a diamond. Loss contours hit the diamond at its corners (on the axes), pushing weights to exactly zero. This produces sparse models.

L2 (Ridge) adds w₀² + w₁² — the constraint region is a circle. Loss contours hit the circle at a smooth tangent point, shrinking weights toward zero but rarely reaching it.

How to Use

Drag the 3D surface to rotate and see the loss landscape from any angle
Switch L1/L2/Elastic/None to see how the penalty reshapes the surface
Adjust λ to control regularization strength
Watch the contour view to see the classic textbook diagram update live
Change loss eccentricity to see how elongated loss contours interact with constraint shapes
Click anywhere on the surface to drop a ball and watch gradient descent converge

Objective Functions

min L(w) + λ · R(w)

Type	Penalty R(w)	Constraint Shape	Effect
L1 (Lasso)	Σ\|w_i\|	Diamond	Sparse weights
L2 (Ridge)	Σw_i²	Circle	Small weights
Elastic Net	α·L1 + (1-α)·L2	Rounded diamond	Both

Why L1 Produces Sparsity

Geometrically: the diamond corners of the L1 constraint protrude along the axes. Elliptical loss contours are most likely to first touch the diamond at a corner, where one weight is exactly zero. The more elongated the loss contours, the stronger this effect.

Analytically: the L1 gradient is ±1 regardless of weight magnitude, providing a constant force toward zero. L2's gradient is 2w, which weakens as w approaches zero.

Current Optimum

Adjust parameters to see the computation.

λ (penalty): 1.00

Eccentricity: 4.0

Loss center w₀: 1.50

Loss center w₁: 1.00

Elastic α: 0.50

Loss rotation: 30°

Gradient Descent

Learning rate: 0.050