RNN Architecture

A Recurrent Neural Network processes sequences one token at a time. At each timestep, the input token is embedded, combined with the previous hidden state, and transformed to produce an output prediction and a new hidden state.

The hidden state ht carries information from all previous tokens forward through the sequence — this is what makes the network recurrent.

How to Use

  • Click a block on the diagram to expand it and see internal weights, matrices, and computations
  • Click again or click empty space to collapse back to the overview
  • Hover over any block for a quick summary tooltip
  • Step through timesteps in the Controls tab to see how hidden state evolves
  • Change dimensions to see how network size affects weight matrices

Forward Pass

  1. Embed: Look up the input character in the embedding matrix E to get xt
  2. Hidden state: Compute ht = tanh(Wxh·xt + Whh·ht-1 + bh)
  3. Output: Compute logits yt = Why·ht + by
  4. Softmax: Convert logits to probabilities over the vocabulary
  5. Predict: Select the highest-probability character

Recurrence

The key insight is that ht depends on ht-1, which depends on ht-2, and so on. This chain allows the network to maintain a “memory” of the entire sequence seen so far.

h0 → h1 → h2 → … → hT

Live Computation

Step through timesteps to see live computations here.