RNN Architecture
A Recurrent Neural Network processes sequences one token at a time. At each timestep, the input token is embedded, combined with the previous hidden state, and transformed to produce an output prediction and a new hidden state.
The hidden state ht carries information from all previous tokens forward through the sequence — this is what makes the network recurrent.
How to Use
- Click a block on the diagram to expand it and see internal weights, matrices, and computations
- Click again or click empty space to collapse back to the overview
- Hover over any block for a quick summary tooltip
- Step through timesteps in the Controls tab to see how hidden state evolves
- Change dimensions to see how network size affects weight matrices
Forward Pass
- Embed: Look up the input character in the embedding matrix E to get xt
- Hidden state: Compute ht = tanh(Wxh·xt + Whh·ht-1 + bh)
- Output: Compute logits yt = Why·ht + by
- Softmax: Convert logits to probabilities over the vocabulary
- Predict: Select the highest-probability character
Recurrence
The key insight is that ht depends on ht-1, which depends on ht-2, and so on. This chain allows the network to maintain a “memory” of the entire sequence seen so far.