How AI Actually Works

You don’t need to understand the math to use AI well. But you do need a mental model that matches reality — otherwise the behavior seems random, and the results stay mediocre.

This page is the dinner-party version. No equations, no papers. Just the concepts that actually matter.

What Is an LLM?

LLM stands for Large Language Model. It’s the technology behind Claude, ChatGPT, Gemini, and most AI assistants you’ve encountered.

Here’s the most useful analogy:

Imagine the world’s most well-read intern. They’ve read essentially all of human writing — books, articles, research papers, forum posts, code, legal documents, recipes, everything. They can synthesize, summarize, explain, translate, and write across nearly any topic. They’re fast, tireless, and genuinely knowledgeable. But they need clear instructions, they can get things wrong, and they don’t remember yesterday’s conversation unless you tell them what happened.

That’s roughly what you’re working with.

The key thing to internalize: it’s not a search engine and it’s not a person. It doesn’t “look things up” in real time. It doesn’t have opinions in the way a person does. It generates responses based on patterns learned during training.

How It Actually Generates Text

Here’s the simplified version of what happens when you send a message:

The model reads your entire conversation (everything in the “context window”) and predicts the most likely next word — one token at a time. Then it does it again. And again. Until the response is complete.

That’s really it. It’s not reasoning in the way humans reason. It’s not searching a database. It’s doing very sophisticated pattern completion at massive scale.

This explains a lot:

Why it sounds confident even when it’s wrong (confidence is a pattern, not a signal of accuracy)
Why the phrasing of your question changes the answer (different input, different pattern activation)
Why it can write convincingly in any style (it’s seen thousands of examples of every style)
Why it can generate plausible-sounding nonsense (plausible ≠ accurate)

!!! note “This is oversimplified” Modern LLMs do a lot more than raw next-token prediction — there’s reinforcement learning from human feedback, chain-of-thought training, and other techniques. But the core pattern-completion intuition is still the right mental model for understanding behavior.

What Is a Token?

When a model reads text, it doesn’t process it letter by letter or word by word. It breaks text into tokens — chunks that are roughly 3-4 characters or about 0.75 words on average.

“Hello, how are you?” is about 6 tokens. A 1,000-word article is roughly 1,300 tokens.

Why it matters:

Cost. AI APIs charge per token (both input and output). More tokens = more cost. For most users on free tiers, this doesn’t matter. For developers building products, it matters a lot.
Context limits. Every model has a maximum number of tokens it can process at once. That’s the “context window.”

Context Windows: Claude’s Working Memory

The context window is the total amount of text a model can hold in its “working memory” at once — everything you’ve said, everything it’s said back, any documents you’ve pasted in, and any instructions set up in advance.

Think of it like a whiteboard. Everything written on that whiteboard is what the model can “see” and reason about. When the whiteboard fills up, old content falls off.

Why it matters in practice:

Want to ask questions about a 50-page document? You need a model with a large enough context window to hold it.
Long conversations eventually lose their early context.
Pasting in a lot of background material “uses up” space.

Rough comparison of current models (early 2026):

Model	Context Window	Notes
Claude Sonnet 4.6	200,000 tokens	~150,000 words
Claude Opus 4.6	200,000 tokens	~150,000 words
GPT-4o	128,000 tokens	~96,000 words
Gemini 2.0 Flash	1,000,000 tokens	Fast and large context

Claude’s 200K context window means you can paste in an entire novel, a full codebase, or months of email history and have a conversation about all of it.

What AI Is Genuinely Good At

These are areas where current models consistently deliver:

Task	Why It Works Well
Summarization	Synthesis is a core strength — pulling key points from large volumes of text
Drafting and editing	Seen millions of examples of every writing style and format
Explaining concepts	Can target any level of background knowledge, use analogies, give examples
Code	Enormous training data in most programming languages; can read, write, debug
Brainstorming	Generating options, angles, and ideas quickly without judgment
Translation	Strong across major languages
Structured analysis	Comparing options, listing trade-offs, building frameworks
Format transformation	Turn a transcript into a summary, a list into a table, prose into bullet points

What AI Is Genuinely Bad At

These are real limitations — not marketing spin:

Limitation	What’s Actually Happening
Real-time information	Training data has a cutoff date. It doesn’t know what happened last week.
Precise math	It generates plausible-looking math but can make arithmetic errors — use a calculator for anything that matters
Consistent facts	It can state things confidently that are wrong. Always verify claims about facts, stats, or specific details.
Remembering past conversations	Each conversation starts fresh unless you provide history
Knowing what it doesn’t know	It doesn’t say “I’m not sure about that” nearly often enough
Following complex multi-step instructions	It can lose track of earlier instructions in long, complex prompts

Why Hallucinations Happen

“Hallucination” is the term for when AI confidently states something that’s false.

It sounds alarming. But it makes sense once you understand the mechanism.

The model isn’t lying. It doesn’t have a concept of truth in the way a person does. It’s doing pattern completion. If a plausible-sounding answer exists in the patterns it learned, it will produce it — even if that specific fact is wrong.

Think of it like this: if you asked someone to complete the sentence “The capital of Australia is…” most people would say “Sydney” because Sydney is the most famous Australian city. But the correct answer is Canberra. The model makes the same kind of mistake — pattern-matching to the most plausible response rather than the accurate one.

What to do about it:

Treat AI output as a first draft, not a final answer
Verify any specific facts, statistics, names, or dates before using them
Ask the model to flag uncertainty: “If you’re not confident about any of this, say so”
For high-stakes content (medical, legal, financial), always go to primary sources

!!! warning “The confident tone is misleading” AI models write in the same confident tone whether they’re 100% correct or completely making something up. The tone is not a signal of accuracy. This trips up nearly every new user.

Temperature and Creativity

Most AI interfaces have a setting called temperature that controls how “creative” or “focused” the responses are.

Low temperature (close to 0): More predictable, consistent, focused. Good for factual Q&A, code generation, precise tasks.
High temperature (close to 1): More varied, creative, surprising. Good for brainstorming, creative writing, generating options.

In most consumer interfaces (claude.ai, ChatGPT), this is managed automatically. You won’t need to set it manually. But if you’re using the API or building applications, it’s an important lever.

Key Terms Glossary

Term	Plain English Definition
LLM	Large Language Model — the technology behind AI assistants like Claude and ChatGPT
Token	The basic unit of text a model processes — roughly 0.75 words or 3-4 characters
Context window	The total text a model can “see” at once — its working memory
Prompt	The message or instructions you send to the model
System prompt	Behind-the-scenes instructions that shape how the model behaves throughout a conversation — set by developers or power users
Hallucination	When a model states something false with apparent confidence
Fine-tuning	Further training a model on specific data to customize its behavior for a particular use case
RAG	Retrieval-Augmented Generation — a technique that lets a model look things up in a specific knowledge base before responding. Gets around the training cutoff problem.
Embedding	A numerical representation of text that captures meaning — used for semantic search and similarity comparisons
API	Application Programming Interface — the way developers send prompts to a model programmatically and get responses back
Temperature	A setting that controls how creative vs. focused model outputs are
Inference	The process of generating a response — what happens when you hit send