AI from Scratch6 min read

AI from Scratch #3: Why Autocomplete Gets Smarter Over Time

You finish your best friend's sentences all the time. ChatGPT does the same thing — except it read the entire internet first. Here's how language models actually work.

RM

Raghu Mudumbai

CEO & Chief Scientist, netcausal.ai

Finishing Each Other's Sentences

You're texting your friend and you type: "Want to grab..." Before you finish, your phone suggests "lunch," "coffee," or "dinner." How did it know?

Or think about your best friend. If they say "I can't believe the teacher gave us homework on—" you already know the next word is probably "Friday" or "vacation" or "the weekend." You predicted it because you know your friend, you know the context, and you've heard thousands of similar sentences before.

That's exactly how ChatGPT, Google's autocomplete, and every language AI works. They're doing one thing: predicting the next word.

That's it. The entire foundation of the most impressive AI you've ever used — the one that writes essays, answers questions, tells jokes, and debates philosophy — is just next-word prediction, done really, really well.

The World's Most Educated Guessing Game

Here's how it works at the simplest level.

Take this sentence: "The cat sat on the ___"

What goes in the blank? You'd probably say "mat," "couch," "floor," or "table." You wouldn't say "airplane" or "democracy" or "purple."

Why? Because you've read and heard enough English to know what words typically follow "the cat sat on the." You have a mental model of which words are likely and which are absurd.

A language model builds the same kind of mental model — except instead of learning from conversations with friends and books you've read, it learns from billions of web pages, books, articles, and conversations. It reads more text in its training than any human could read in a thousand lifetimes.

After all that reading, it develops incredibly detailed intuitions about which words follow which. Not just simple patterns like "cat sat on the mat," but complex ones like: after a paragraph about quantum physics that ends with "the implications for computing are," the next word is probably "profound" or "significant" or "enormous."

How the Prediction Actually Happens

Let's peek under the hood. When a language model sees the phrase "The cat sat on the," here's what happens:

Step 1: Turn words into numbers. Every word gets converted into a list of numbers (called a "vector") that captures its meaning. Words with similar meanings get similar numbers. "King" and "queen" have similar numbers. "King" and "banana" don't.

Step 2: Look at context. The model doesn't just look at the last word — it looks at the entire conversation so far. This is where the "Transformer" architecture comes in (the T in GPT). It uses something called "attention" to figure out which earlier words matter most for predicting the next one.

Think of it like this: in the sentence "The doctor told the patient that she should take the medicine with ___", the model needs to "pay attention" to "medicine" and "take" to predict "water" or "food." It's not just looking at the word "with" — it's looking at the whole context.

Step 3: Calculate probabilities. The model outputs a probability for every possible next word. Maybe "mat" gets 25%, "floor" gets 20%, "couch" gets 15%, "table" gets 10%, and thousands of other words split the remaining 30%.

Step 4: Pick one. The model picks a word (sometimes the most likely, sometimes a slightly less likely one for variety) and outputs it. Then it adds that word to the sentence and repeats the whole process for the next word.

One word at a time. That's how ChatGPT "writes" — by playing the world's most sophisticated autocomplete game, one word at a time.

Wait — If It's Just Predicting Words, How Does It "Know" Things?

This is the most mind-bending part.

When you predict that "The capital of France is ___" ends with "Paris," you're not just doing word prediction — you're using knowledge. You know Paris is the capital of France.

Language models work the same way. By reading billions of documents, they absorb factual knowledge inside their word-prediction patterns. The model doesn't have a database that says "France → Paris." Instead, it learned that in the patterns of human language, the words "capital of France is" are overwhelmingly followed by "Paris."

Knowledge emerges from patterns. That's a deep idea, and it's one of the reasons language models are so impressive — and also why they sometimes get things wrong.

Why It Sometimes Makes Things Up

Here's the catch. The model always produces a next word, even when it doesn't "know" the answer. If you ask it an obscure question it wasn't trained on, it doesn't say "I don't know" by default. Instead, it generates the most probable-sounding sequence of words — which might be completely wrong but sounds convincing.

This is called hallucination, and it's one of the biggest challenges in AI today. The model is like that one friend who never admits they don't know something — they'll confidently give you an answer that sounds great but is totally made up.

Why does this happen? Because the model was trained to predict likely words, not truthful words. "The inventor of the telephone was Alexander Graham Bell" and "The inventor of the telephone was Thomas Edison" both sound like valid sentences. The model has to rely on which pattern it saw more often in its training data.

Try It Yourself

Open your phone's keyboard and start typing "I really love." Now tap the autocomplete suggestion over and over without choosing your own words. Let the phone write a whole paragraph for you.

What you get is a mini language model in action. It's predicting the next word based on patterns. It'll probably make grammatically correct sentences that sort of make sense — but might ramble or say something weird.

Now imagine that same system, but a trillion times more powerful, trained on the entire internet. That's ChatGPT.

The Big Takeaway

Language models are next-word predictors trained on enormous amounts of text. They don't "understand" language the way you do — they've just seen so many patterns that they can generate text that looks like it was written by someone who does understand.

The same core idea powers your phone's autocomplete, Google's search suggestions, email smart replies, ChatGPT, and every AI assistant you've talked to. The difference is just scale — how much data they trained on and how many parameters (those adjustable weights from Article #1) they have.

What's Next

In Article #4, we'll explore how AI learns from rewards — like training a dog with treats. It's called reinforcement learning, and it's how AI learned to beat humans at chess, Go, and video games.


This is part of the AI from Scratch series — making AI and machine learning understandable for everyone, no PhD required. Follow along on Medium or at netcausal.ai/blog.

ai-from-scratchlanguage-modelschatgptbeginnersnlp
Share

Stay ahead of the curve

Get insights on causal AI, network infrastructure, and enterprise technology delivered to your inbox.

No spam. Unsubscribe anytime.