Probability Distribution H = 1.000 bits

What is Shannon Entropy?

Shannon Entropy measures the average "surprise" or uncertainty in a probability distribution. Named after Claude Shannon, the founder of information theory.

Key Intuitions

  • Uncertainty: Higher entropy = more uncertainty about outcomes
  • Bits: Entropy tells you minimum bits needed to encode messages
  • Fair coin: Maximum uncertainty for 2 outcomes = 1 bit
  • Certain outcome: One bar at 100% = 0 bits (no surprise)

How to Use

  • Drag bars up/down to adjust probabilities
  • Select presets to see classic examples
  • Add outcomes to explore more complex distributions
  • Normalize to fix probabilities that don't sum to 100%

Computing Entropy

  1. For each outcome i with probability pi
  2. Calculate "surprise": -log2(pi)
  3. Weight by probability: pi × surprise
  4. Sum all weighted surprises
function entropy(probabilities):
    H = 0
    for p in probabilities:
        if p > 0:
            H -= p * log2(p)
    return H

Why log2?

Using base 2 gives units in bits. A fair coin flip has 1 bit of entropy - you need exactly 1 binary digit to encode heads/tails.

Surprise / Self-Information

The "surprise" of seeing outcome x is -log2(p(x)). Rare events are more surprising. Entropy is the expected surprise.

The Shannon Entropy Formula

H(X) = -Σ p(x) log2 p(x)

Current Calculation

Adjust the distribution to see calculations

Properties

  • Non-negative: H(X) ≥ 0 always
  • Maximum: H(X) ≤ log2(n) for n outcomes
  • Achieved when: uniform distribution (all p equal)
  • Zero when: one outcome has p = 1 (certainty)

Sample from Distribution

Draw random samples from the current distribution to see "surprise" values. With many samples, the average surprise approaches the entropy!

Last: - Surprise: - bits
No samples yet. Click "Sample" to begin.
Total Samples: 0 Avg Surprise: -
Insight: As you collect more samples, the average surprise converges to the entropy H(X). This is the law of large numbers in action!