Small Models, Big Knowledge: Prompting Past the First Guess

Smaller language models aren’t inherently dumb; their true potential shows when prompts steer retrieval away from easy generalizations, unlocking non-obvious knowledge and cutting costs.

Smaller Models Aren’t “Dumb”—They’re Just Easier to Misuse

One of the most surprising things I discovered when working with smaller language models is how quickly people dismiss them as “not capable.” In many cases, that judgment isn’t really about the model’s underlying ability—it’s about how the model is being prompted.

The real goal, of course, is to make these systems as easy to use as possible. If you have to spend a long time crafting prompts just to get decent output, it can feel like wasted effort. But there’s an important trade-off: if a smaller model can cut your costs dramatically—sometimes by half or more (think of the old GPT‑3 days, where Ada vs. DaVinci could be a big cost difference)—then learning how to prompt the smaller model well can be worth it.

The catch is that many people never learn how to do that. They try a straightforward prompt, get a wrong answer, and conclude the model “doesn’t know.” Often, that conclusion is wrong.

A Simple Example: Airport Codes

Here’s a concrete example.

If you ask a large model like GPT‑3 DaVinci for airport codes, it usually does great:

Boston → BOS
Los Angeles → LAX
Orlando → MCO

Ask a smaller model—say, Babbage (and often even mid-sized ones, depending on the case)—and something interesting happens. It might get BOS right, and it might know LAX because it’s so common. But when you ask for Orlando, it may answer:

Orlando → ORL

That’s a plausible guess—but it’s wrong for Orlando International Airport (MCO). And it reveals what the smaller model is doing: it has generalized a rule.

“Airport codes are probably abbreviations of the city.”

That rule is sometimes true, but not always. The model isn’t necessarily lacking the correct information. It may simply be choosing an “efficient” shortcut instead of retrieving the less obvious fact.

The Key Insight: The Knowledge Can Be There—But Not Accessible by Default

When people see ORL, they often assume: “The model doesn’t know MCO.”

But given how much text these models have consumed, it’s very likely that MCO exists somewhere in the training data. The real problem is retrieval: the model is operating in the wrong “space,” following a generalization that works often enough.

So the question becomes:

If the information is in there, how do you get it out?

Prompting as Retrieval Steering

One method that works surprisingly well is to explicitly break the model out of its default assumption.

For example, you can prompt it with a short framing statement plus examples:

“Airport codes are sometimes abbreviations of cities, and sometimes they aren’t.”
Provide a couple examples where the airport code does not match the city name.
Then ask: “What’s the airport code for Orlando?”

In that setup, the smaller model is much more likely to answer correctly:

Orlando → MCO

What changed? You didn’t “teach” it the concept of airport codes. You gave it just enough context to stop relying on the easy generalization and instead search its latent knowledge for the non-obvious answer.

This pattern shows up constantly: models take the easiest path—generalization or memorization—to produce an answer. If you want something that sits outside that easy path, you often have to guide the model toward it.

This Isn’t Just a Small-Model Problem

It’s tempting to think this is only about weaker models. It isn’t.

Even large models often look “limited” when what’s really happening is that the prompt is forcing them into an unhelpful reasoning strategy. I’ve seen this show up in academic work, too, where papers claim “models can’t do X,” but the prompts and setups are shallow—almost designed to fail. The model misses, the authors declare a limitation, and a paper gets written.

Two recurring themes:

The “reversal” issue
There’s an idea sometimes discussed (for example, under names like the “reversal curse”) that models can generalize from A → B but not from B → A. In practice, this often depends heavily on training and—more importantly for everyday users—on how you ask. With the right prompting, you can often elicit the reverse mapping because the information is already represented; it’s just not being accessed under the default framing.
Sensitivity to problem structure or order
Some research has argued that models fail at certain types of problems if you change the order or structure of the question. But many of these failures can be mitigated with a simple instruction that tells the model to evaluate the structure of the problem first and then answer accordingly. Once prompted to do that, the model becomes far less brittle.

In other words: many “limitations” are really prompt-induced failure modes.

What to Take Away

If you work with language models—especially when cost matters—this is the lesson:

Models often default to the most efficient shortcut.
Smaller models rely on shortcuts more aggressively.
Wrong answers don’t always mean missing knowledge.
Prompting isn’t just about “telling it what to do”—it’s about steering retrieval and preventing lazy generalizations.
Many headline “model can’t do this” claims collapse when you push the prompting further.

Smaller models aren’t automatically worse—they’re less forgiving. If you give them the right framing, examples, and constraints, they can often perform far beyond what a quick test suggests. And even with big models, the same principle holds: when the output looks like a limitation, it’s worth asking whether you’ve simply put the model into the wrong mode of thinking.

Small Models, Big Knowledge: Prompting Past the First Guess

More to read

Next