Autoregressive LLM Interactiveractive Lab

Core Concepts

Autoregressive language models generate text one token at a time, where each new token depends on all previous tokens in the sequence.

This approach enables the model to capture long-range dependencies and produce coherent, contextually relevant text.

P(x) = Π_t=1^T P(x_t | x_<t)

The probability of a sequence is the product of the probabilities of each token given all previous tokens in the sequence.

t = 1

Confidence: 0.85

Path: Embedding → Attention → FFN → Softmax

Depends only on previous tokens

Number of Layers 8

Attention Heads 12

Temperature 1.0

Top-k 50

Top-p 0.9

Deterministic Mode

Maps input tokens to vector space

Adds positional information using sine waves

Multi-layer stack with attention mechanisms

Captures relationships between tokens

Processes each position independently

Produces probability distribution over tokens

Once upon a ...

Input prompt: "Once upon a"

Showing top 5 probable next tokens

Input tokens: [1, 3]

Embedding: [1, 3, 4096]

Attention weights: [12, 1, 3, 3]

FFN output: [1, 3, 4096]

Logits: [1, 3, 50257]