Using Small Models for Complex Natural-Language Tasks

Thoughtful prompting and lightweight schemas let small language models reliably convert flexible natural-language input into structured data for real-world tasks like scheduling, at a fraction of the typical cost.

One of the simplest “little” capabilities I found early on with language models ended up being one of the biggest practical game changers: making sense of basic natural-language instructions.

It still surprises me how long it took for this to show up widely in real products. Even now, I’ll use applications where I think, “Why didn’t anyone just put a small LLM in the middle to solve this?” The task isn’t glamorous: it’s often just taking what a person wrote, which is vague-but-human-clear, and turning it into something structured that software can reliably act on.

A few examples of things I built that worked remarkably well:

Address formatters
Data extraction helpers
“Instruction normalizers” that turned user phrasing into clean parameters

But one of my favorite examples was scheduling/availability parsing: taking a sentence like “I’m available tomorrow or Wednesday after 8pm” and converting it into explicit dates and times.

Humans do this effortlessly. Traditional software usually makes you force people into rigid UI flows (“pick a date, pick a time, pick a timezone…”). But people don’t talk that way. They say:

“tomorrow through thursday at 9 is perfect”
“all week after 6 except Wednesday”
“next month around the 15th”

And crucially: these are phrased very differently, but the intent is usually clear if you think about it for a moment.

This worked over five years ago (with smaller models)

Going through my notes, I came across a setup I used with Curie Instruct (a variation of the Curie model with instruction fine-tuning). The model could take simple natural language availability and “spit out” availability slots: a list of explicit dates/times, plus exceptions when needed.

This capability existed over five years ago. What took longer was for people to really implement it into applications in a cost-effective way.

I think part of what happened is that many teams got stuck in the “GPT-3 DaVinci world”: use the biggest model, pay the biggest price. If the smaller model struggled on an early attempt, people would conclude, “It can’t do it,” and move on. Then the product decision would become: “We can’t add that feature—it’s too expensive.”

But in many cases, it wasn’t that the smaller model couldn’t do it. It was that it needed better prompting and a more constrained setup.

My approach was to use a smaller model like Curie, set the temperature low, and get really good results—at roughly one-tenth the price. This, to me, was one of the most under-explored areas: prompt engineering to unlock cheaper models for real product tasks.

An example prompt I used while experimenting

Below is a snapshot-style example prompt format I played around with. The idea is:

Provide a “Today:” anchor so relative dates (“tomorrow”, “next month”) become computable.
Provide a few examples of input/output in a consistent schema.
Keep temperature low for consistency (in my notes, temp = 0.22).
Use a stop sequence to prevent the model from rambling (in my notes: Stop = “today”).

Good with curie-instruct at temp .22  
Stop = "today"

Today: Monday  
User: "i'm available tomorrow or wednesday after 8pm"  
Exceptions: none  
Dates: Tuesday at 8pm, Wednesday at 8pm

Today: Thursday May 25th, 2021  
User: "I'm good June on the 18, 19 and 20th at 5pm"  
Exceptions: none  
Dates: June 18th - June 20th at 5pm

Today: Saturday  
User: "i'm available all week after 6 except Wednesday"  
Exceptions: Wednesday  
Dates: Sunday at 6pm, Monday at 6pm, Tuesday at 6pm, [No Wednesday], Thursday at 6pm, Friday at 6pm, Saturday at 6 pm

Today: Sunday  
User: "tomorrow through thursday at 9 is perfect"  
Exceptions: none  
Dates:  Monday at 9pm, Tuesday at 9pm, Wednesday at 9pm, Thursday at 9pm

Today: Thursday April 11th, 2021  
User: "How about next month around the 15th?"  
Exceptions: none  
Dates: May 15th

Today: Saturday March 2, 2021  
User: "Ideally the best time for me is next month on the 17th at 5pm or 9pm"  
Exceptions: none  
Dates: March 17th at 5pm, March 17th at 9pm

This isn’t meant to be a universal perfect prompt; it’s an example of the kind of iterative, practical prompting I used to explore the behavior and reliability of the model. The “schema” is intentionally simple: Today / User / Exceptions / Dates. That simplicity helps the model stay consistent, and it makes the output easy to pass into downstream code.

Why this matters for product building

The key point is not “wow, the model knows dates.” The point is that there’s an entire class of product problems where users communicate in flexible natural language, but your software needs structured data.

Scheduling is a perfect example, but it’s hardly the only one:

extracting fields from messages
turning “messy” instructions into parameters
normalizing variations in phrasing
handling exceptions (“except Wednesday”)

These aren’t tasks that require the largest possible model. They often benefit more from:

good examples
a stable output format
low temperature
careful constraints (like stop sequences)
some light prompt engineering iterations

And when you can solve them with smaller models, the economics change dramatically: features that seemed “too expensive” suddenly become feasible.

Closing thought

If you’re building applications and you find yourself thinking “users don’t follow instructions” or “we need a complicated UI to force structure,” it’s worth considering whether a small language model—properly prompted—can sit in the middle and do the unglamorous but extremely valuable job of translating human intent into structured output.

In my experience, that simple translation layer is one of the most useful things these models can do. And it’s been there for a long time.

Using Small Models for Complex Natural-Language Tasks

This worked over five years ago (with smaller models)

An example prompt I used while experimenting

Why this matters for product building

Closing thought

More to read

Next