Skip to content

How AI Actually Works

You don’t need to understand the math to use AI well. But you do need a mental model that matches reality — otherwise the behavior seems random, and the results stay mediocre.

This page is the dinner-party version. No equations, no papers. Just the concepts that actually matter.


LLM stands for Large Language Model. It’s the technology behind Claude, ChatGPT, Gemini, and most AI assistants you’ve encountered.

Here’s the most useful analogy:

Imagine the world’s most well-read intern. They’ve read essentially all of human writing — books, articles, research papers, forum posts, code, legal documents, recipes, everything. They can synthesize, summarize, explain, translate, and write across nearly any topic. They’re fast, tireless, and genuinely knowledgeable. But they need clear instructions, they can get things wrong, and they don’t remember yesterday’s conversation unless you tell them what happened.

That’s roughly what you’re working with.

The key thing to internalize: it’s not a search engine and it’s not a person. It doesn’t “look things up” in real time. It doesn’t have opinions in the way a person does. It generates responses based on patterns learned during training.


Here’s the simplified version of what happens when you send a message:

The model reads your entire conversation (everything in the “context window”) and predicts the most likely next word — one token at a time. Then it does it again. And again. Until the response is complete.

That’s really it. It’s not reasoning in the way humans reason. It’s not searching a database. It’s doing very sophisticated pattern completion at massive scale.

This explains a lot:

  • Why it sounds confident even when it’s wrong (confidence is a pattern, not a signal of accuracy)
  • Why the phrasing of your question changes the answer (different input, different pattern activation)
  • Why it can write convincingly in any style (it’s seen thousands of examples of every style)
  • Why it can generate plausible-sounding nonsense (plausible ≠ accurate)

!!! note “This is oversimplified” Modern LLMs do a lot more than raw next-token prediction — there’s reinforcement learning from human feedback, chain-of-thought training, and other techniques. But the core pattern-completion intuition is still the right mental model for understanding behavior.


When a model reads text, it doesn’t process it letter by letter or word by word. It breaks text into tokens — chunks that are roughly 3-4 characters or about 0.75 words on average.

“Hello, how are you?” is about 6 tokens. A 1,000-word article is roughly 1,300 tokens.

Why it matters:

  1. Cost. AI APIs charge per token (both input and output). More tokens = more cost. For most users on free tiers, this doesn’t matter. For developers building products, it matters a lot.

  2. Context limits. Every model has a maximum number of tokens it can process at once. That’s the “context window.”


Context Windows: Claude’s Working Memory

Section titled “Context Windows: Claude’s Working Memory”

The context window is the total amount of text a model can hold in its “working memory” at once — everything you’ve said, everything it’s said back, any documents you’ve pasted in, and any instructions set up in advance.

Think of it like a whiteboard. Everything written on that whiteboard is what the model can “see” and reason about. When the whiteboard fills up, old content falls off.

Why it matters in practice:

  • Want to ask questions about a 50-page document? You need a model with a large enough context window to hold it.
  • Long conversations eventually lose their early context.
  • Pasting in a lot of background material “uses up” space.

Rough comparison of current models (early 2026):

ModelContext WindowNotes
Claude Sonnet 4.6200,000 tokens~150,000 words
Claude Opus 4.6200,000 tokens~150,000 words
GPT-4o128,000 tokens~96,000 words
Gemini 2.0 Flash1,000,000 tokensFast and large context

Claude’s 200K context window means you can paste in an entire novel, a full codebase, or months of email history and have a conversation about all of it.


These are areas where current models consistently deliver:

TaskWhy It Works Well
SummarizationSynthesis is a core strength — pulling key points from large volumes of text
Drafting and editingSeen millions of examples of every writing style and format
Explaining conceptsCan target any level of background knowledge, use analogies, give examples
CodeEnormous training data in most programming languages; can read, write, debug
BrainstormingGenerating options, angles, and ideas quickly without judgment
TranslationStrong across major languages
Structured analysisComparing options, listing trade-offs, building frameworks
Format transformationTurn a transcript into a summary, a list into a table, prose into bullet points

These are real limitations — not marketing spin:

LimitationWhat’s Actually Happening
Real-time informationTraining data has a cutoff date. It doesn’t know what happened last week.
Precise mathIt generates plausible-looking math but can make arithmetic errors — use a calculator for anything that matters
Consistent factsIt can state things confidently that are wrong. Always verify claims about facts, stats, or specific details.
Remembering past conversationsEach conversation starts fresh unless you provide history
Knowing what it doesn’t knowIt doesn’t say “I’m not sure about that” nearly often enough
Following complex multi-step instructionsIt can lose track of earlier instructions in long, complex prompts

“Hallucination” is the term for when AI confidently states something that’s false.

It sounds alarming. But it makes sense once you understand the mechanism.

The model isn’t lying. It doesn’t have a concept of truth in the way a person does. It’s doing pattern completion. If a plausible-sounding answer exists in the patterns it learned, it will produce it — even if that specific fact is wrong.

Think of it like this: if you asked someone to complete the sentence “The capital of Australia is…” most people would say “Sydney” because Sydney is the most famous Australian city. But the correct answer is Canberra. The model makes the same kind of mistake — pattern-matching to the most plausible response rather than the accurate one.

What to do about it:

  • Treat AI output as a first draft, not a final answer
  • Verify any specific facts, statistics, names, or dates before using them
  • Ask the model to flag uncertainty: “If you’re not confident about any of this, say so”
  • For high-stakes content (medical, legal, financial), always go to primary sources

!!! warning “The confident tone is misleading” AI models write in the same confident tone whether they’re 100% correct or completely making something up. The tone is not a signal of accuracy. This trips up nearly every new user.


Most AI interfaces have a setting called temperature that controls how “creative” or “focused” the responses are.

  • Low temperature (close to 0): More predictable, consistent, focused. Good for factual Q&A, code generation, precise tasks.
  • High temperature (close to 1): More varied, creative, surprising. Good for brainstorming, creative writing, generating options.

In most consumer interfaces (claude.ai, ChatGPT), this is managed automatically. You won’t need to set it manually. But if you’re using the API or building applications, it’s an important lever.


TermPlain English Definition
LLMLarge Language Model — the technology behind AI assistants like Claude and ChatGPT
TokenThe basic unit of text a model processes — roughly 0.75 words or 3-4 characters
Context windowThe total text a model can “see” at once — its working memory
PromptThe message or instructions you send to the model
System promptBehind-the-scenes instructions that shape how the model behaves throughout a conversation — set by developers or power users
HallucinationWhen a model states something false with apparent confidence
Fine-tuningFurther training a model on specific data to customize its behavior for a particular use case
RAGRetrieval-Augmented Generation — a technique that lets a model look things up in a specific knowledge base before responding. Gets around the training cutoff problem.
EmbeddingA numerical representation of text that captures meaning — used for semantic search and similarity comparisons
APIApplication Programming Interface — the way developers send prompts to a model programmatically and get responses back
TemperatureA setting that controls how creative vs. focused model outputs are
InferenceThe process of generating a response — what happens when you hit send