λ = 1.00

What is Regularization?

Regularization adds a penalty term to the loss function that discourages large weights. This prevents overfitting by constraining model complexity.

L1 vs L2: The Key Insight

L1 (Lasso) adds |w0| + |w1| — the constraint region is a diamond. Loss contours hit the diamond at its corners (on the axes), pushing weights to exactly zero. This produces sparse models.

L2 (Ridge) adds w0² + w1² — the constraint region is a circle. Loss contours hit the circle at a smooth tangent point, shrinking weights toward zero but rarely reaching it.

How to Use

  • Drag the 3D surface to rotate and see the loss landscape from any angle
  • Switch L1/L2/Elastic/None to see how the penalty reshapes the surface
  • Adjust λ to control regularization strength
  • Watch the contour view to see the classic textbook diagram update live
  • Change loss eccentricity to see how elongated loss contours interact with constraint shapes
  • Click anywhere on the surface to drop a ball and watch gradient descent converge

Objective Functions

min L(w) + λ · R(w)
TypePenalty R(w)Constraint ShapeEffect
L1 (Lasso)Σ|wi|DiamondSparse weights
L2 (Ridge)Σwi²CircleSmall weights
Elastic Netα·L1 + (1-α)·L2Rounded diamondBoth

Why L1 Produces Sparsity

Geometrically: the diamond corners of the L1 constraint protrude along the axes. Elliptical loss contours are most likely to first touch the diamond at a corner, where one weight is exactly zero. The more elongated the loss contours, the stronger this effect.

Analytically: the L1 gradient is ±1 regardless of weight magnitude, providing a constant force toward zero. L2's gradient is 2w, which weakens as w approaches zero.

Current Optimum

Adjust parameters to see the computation.
1.00
4.0
1.50
1.00
0.50
30°
Gradient Descent
0.050