What is Shannon Entropy?
Shannon Entropy measures the average "surprise" or uncertainty in a probability distribution. Named after Claude Shannon, the founder of information theory.
Key Intuitions
- Uncertainty: Higher entropy = more uncertainty about outcomes
- Bits: Entropy tells you minimum bits needed to encode messages
- Fair coin: Maximum uncertainty for 2 outcomes = 1 bit
- Certain outcome: One bar at 100% = 0 bits (no surprise)
How to Use
- Drag bars up/down to adjust probabilities
- Select presets to see classic examples
- Add outcomes to explore more complex distributions
- Normalize to fix probabilities that don't sum to 100%
Computing Entropy
- For each outcome i with probability pi
- Calculate "surprise": -log2(pi)
- Weight by probability: pi × surprise
- Sum all weighted surprises
function entropy(probabilities):
H = 0
for p in probabilities:
if p > 0:
H -= p * log2(p)
return H
Why log2?
Using base 2 gives units in bits. A fair coin flip has 1 bit of entropy - you need exactly 1 binary digit to encode heads/tails.
Surprise / Self-Information
The "surprise" of seeing outcome x is -log2(p(x)).
Rare events are more surprising. Entropy is the expected surprise.
The Shannon Entropy Formula
H(X) = -Σ p(x) log2 p(x)
Current Calculation
Properties
- Non-negative: H(X) ≥ 0 always
- Maximum: H(X) ≤ log2(n) for n outcomes
- Achieved when: uniform distribution (all p equal)
- Zero when: one outcome has p = 1 (certainty)
Sample from Distribution
Draw random samples from the current distribution to see "surprise" values. With many samples, the average surprise approaches the entropy!