What is Regularization?
Regularization adds a penalty term to the loss function that discourages large weights. This prevents overfitting by constraining model complexity.
L1 vs L2: The Key Insight
L1 (Lasso) adds |w0| + |w1| — the constraint region is a diamond. Loss contours hit the diamond at its corners (on the axes), pushing weights to exactly zero. This produces sparse models.
L2 (Ridge) adds w0² + w1² — the constraint region is a circle. Loss contours hit the circle at a smooth tangent point, shrinking weights toward zero but rarely reaching it.
How to Use
- Drag the 3D surface to rotate and see the loss landscape from any angle
- Switch L1/L2/Elastic/None to see how the penalty reshapes the surface
- Adjust λ to control regularization strength
- Watch the contour view to see the classic textbook diagram update live
- Change loss eccentricity to see how elongated loss contours interact with constraint shapes
- Click anywhere on the surface to drop a ball and watch gradient descent converge
Objective Functions
min L(w) + λ · R(w)
| Type | Penalty R(w) | Constraint Shape | Effect |
|---|---|---|---|
| L1 (Lasso) | Σ|wi| | Diamond | Sparse weights |
| L2 (Ridge) | Σwi² | Circle | Small weights |
| Elastic Net | α·L1 + (1-α)·L2 | Rounded diamond | Both |
Why L1 Produces Sparsity
Geometrically: the diamond corners of the L1 constraint protrude along the axes. Elliptical loss contours are most likely to first touch the diamond at a corner, where one weight is exactly zero. The more elongated the loss contours, the stronger this effect.
Analytically: the L1 gradient is ±1 regardless of weight magnitude, providing a constant force toward zero. L2's gradient is 2w, which weakens as w approaches zero.