How AI Actually Works
How AI Actually Works
Section titled “How AI Actually Works”You don’t need to understand the math to use AI well. But you do need a mental model that matches reality — otherwise the behavior seems random, and the results stay mediocre.
This page is the dinner-party version. No equations, no papers. Just the concepts that actually matter.
What Is an LLM?
Section titled “What Is an LLM?”LLM stands for Large Language Model. It’s the technology behind Claude, ChatGPT, Gemini, and most AI assistants you’ve encountered.
Here’s the most useful analogy:
Imagine the world’s most well-read intern. They’ve read essentially all of human writing — books, articles, research papers, forum posts, code, legal documents, recipes, everything. They can synthesize, summarize, explain, translate, and write across nearly any topic. They’re fast, tireless, and genuinely knowledgeable. But they need clear instructions, they can get things wrong, and they don’t remember yesterday’s conversation unless you tell them what happened.
That’s roughly what you’re working with.
The key thing to internalize: it’s not a search engine and it’s not a person. It doesn’t “look things up” in real time. It doesn’t have opinions in the way a person does. It generates responses based on patterns learned during training.
How It Actually Generates Text
Section titled “How It Actually Generates Text”Here’s the simplified version of what happens when you send a message:
The model reads your entire conversation (everything in the “context window”) and predicts the most likely next word — one token at a time. Then it does it again. And again. Until the response is complete.
That’s really it. It’s not reasoning in the way humans reason. It’s not searching a database. It’s doing very sophisticated pattern completion at massive scale.
This explains a lot:
- Why it sounds confident even when it’s wrong (confidence is a pattern, not a signal of accuracy)
- Why the phrasing of your question changes the answer (different input, different pattern activation)
- Why it can write convincingly in any style (it’s seen thousands of examples of every style)
- Why it can generate plausible-sounding nonsense (plausible ≠ accurate)
!!! note “This is oversimplified” Modern LLMs do a lot more than raw next-token prediction — there’s reinforcement learning from human feedback, chain-of-thought training, and other techniques. But the core pattern-completion intuition is still the right mental model for understanding behavior.
What Is a Token?
Section titled “What Is a Token?”When a model reads text, it doesn’t process it letter by letter or word by word. It breaks text into tokens — chunks that are roughly 3-4 characters or about 0.75 words on average.
“Hello, how are you?” is about 6 tokens. A 1,000-word article is roughly 1,300 tokens.
Why it matters:
-
Cost. AI APIs charge per token (both input and output). More tokens = more cost. For most users on free tiers, this doesn’t matter. For developers building products, it matters a lot.
-
Context limits. Every model has a maximum number of tokens it can process at once. That’s the “context window.”
Context Windows: Claude’s Working Memory
Section titled “Context Windows: Claude’s Working Memory”The context window is the total amount of text a model can hold in its “working memory” at once — everything you’ve said, everything it’s said back, any documents you’ve pasted in, and any instructions set up in advance.
Think of it like a whiteboard. Everything written on that whiteboard is what the model can “see” and reason about. When the whiteboard fills up, old content falls off.
Why it matters in practice:
- Want to ask questions about a 50-page document? You need a model with a large enough context window to hold it.
- Long conversations eventually lose their early context.
- Pasting in a lot of background material “uses up” space.
Rough comparison of current models (early 2026):
| Model | Context Window | Notes |
|---|---|---|
| Claude Sonnet 4.6 | 200,000 tokens | ~150,000 words |
| Claude Opus 4.6 | 200,000 tokens | ~150,000 words |
| GPT-4o | 128,000 tokens | ~96,000 words |
| Gemini 2.0 Flash | 1,000,000 tokens | Fast and large context |
Claude’s 200K context window means you can paste in an entire novel, a full codebase, or months of email history and have a conversation about all of it.
What AI Is Genuinely Good At
Section titled “What AI Is Genuinely Good At”These are areas where current models consistently deliver:
| Task | Why It Works Well |
|---|---|
| Summarization | Synthesis is a core strength — pulling key points from large volumes of text |
| Drafting and editing | Seen millions of examples of every writing style and format |
| Explaining concepts | Can target any level of background knowledge, use analogies, give examples |
| Code | Enormous training data in most programming languages; can read, write, debug |
| Brainstorming | Generating options, angles, and ideas quickly without judgment |
| Translation | Strong across major languages |
| Structured analysis | Comparing options, listing trade-offs, building frameworks |
| Format transformation | Turn a transcript into a summary, a list into a table, prose into bullet points |
What AI Is Genuinely Bad At
Section titled “What AI Is Genuinely Bad At”These are real limitations — not marketing spin:
| Limitation | What’s Actually Happening |
|---|---|
| Real-time information | Training data has a cutoff date. It doesn’t know what happened last week. |
| Precise math | It generates plausible-looking math but can make arithmetic errors — use a calculator for anything that matters |
| Consistent facts | It can state things confidently that are wrong. Always verify claims about facts, stats, or specific details. |
| Remembering past conversations | Each conversation starts fresh unless you provide history |
| Knowing what it doesn’t know | It doesn’t say “I’m not sure about that” nearly often enough |
| Following complex multi-step instructions | It can lose track of earlier instructions in long, complex prompts |
Why Hallucinations Happen
Section titled “Why Hallucinations Happen”“Hallucination” is the term for when AI confidently states something that’s false.
It sounds alarming. But it makes sense once you understand the mechanism.
The model isn’t lying. It doesn’t have a concept of truth in the way a person does. It’s doing pattern completion. If a plausible-sounding answer exists in the patterns it learned, it will produce it — even if that specific fact is wrong.
Think of it like this: if you asked someone to complete the sentence “The capital of Australia is…” most people would say “Sydney” because Sydney is the most famous Australian city. But the correct answer is Canberra. The model makes the same kind of mistake — pattern-matching to the most plausible response rather than the accurate one.
What to do about it:
- Treat AI output as a first draft, not a final answer
- Verify any specific facts, statistics, names, or dates before using them
- Ask the model to flag uncertainty: “If you’re not confident about any of this, say so”
- For high-stakes content (medical, legal, financial), always go to primary sources
!!! warning “The confident tone is misleading” AI models write in the same confident tone whether they’re 100% correct or completely making something up. The tone is not a signal of accuracy. This trips up nearly every new user.
Temperature and Creativity
Section titled “Temperature and Creativity”Most AI interfaces have a setting called temperature that controls how “creative” or “focused” the responses are.
- Low temperature (close to 0): More predictable, consistent, focused. Good for factual Q&A, code generation, precise tasks.
- High temperature (close to 1): More varied, creative, surprising. Good for brainstorming, creative writing, generating options.
In most consumer interfaces (claude.ai, ChatGPT), this is managed automatically. You won’t need to set it manually. But if you’re using the API or building applications, it’s an important lever.
Key Terms Glossary
Section titled “Key Terms Glossary”| Term | Plain English Definition |
|---|---|
| LLM | Large Language Model — the technology behind AI assistants like Claude and ChatGPT |
| Token | The basic unit of text a model processes — roughly 0.75 words or 3-4 characters |
| Context window | The total text a model can “see” at once — its working memory |
| Prompt | The message or instructions you send to the model |
| System prompt | Behind-the-scenes instructions that shape how the model behaves throughout a conversation — set by developers or power users |
| Hallucination | When a model states something false with apparent confidence |
| Fine-tuning | Further training a model on specific data to customize its behavior for a particular use case |
| RAG | Retrieval-Augmented Generation — a technique that lets a model look things up in a specific knowledge base before responding. Gets around the training cutoff problem. |
| Embedding | A numerical representation of text that captures meaning — used for semantic search and similarity comparisons |
| API | Application Programming Interface — the way developers send prompts to a model programmatically and get responses back |
| Temperature | A setting that controls how creative vs. focused model outputs are |
| Inference | The process of generating a response — what happens when you hit send |