Separating Instruction from Content: A Core Prompt Reliability Pattern

Clearly separate instruction from content with a reliable delimiter (three hashtags often being the strongest) and present structured data (Markdown, XML, or JSON) to reduce ambiguity and improve model performance.

One of the quirky little things we had to discover early on with the GPT models was how to separate text—specifically, how to mark the difference between an instruction and the copy you wanted the model to operate on.

Because, as we said before, these models weren’t explicitly trained as instruction-following models (user-assistant models where you give instructions and it knows what it’s supposed to do). They were trained on large batches of text and were simply trying to complete whatever pattern came next. That meant that if you were trying to get it to do something useful—like, “replace every time a cat appears in this text with a dog”—you needed a clear way to signal: the instruction has ended, and the content begins here.

In the early days, you didn’t have the benefit of that behavior being trained into the model, so we had to discover natural separators. And it turned out there were a few that worked reliably.

One of the strongest was just using three hashtags in a row:

This seemed to work pretty well. My intuition was that if you read things like press releases and other copy, that kind of marker is often used to separate the main body from “notes,” “instructions,” or metadata. I don’t know if that’s actually why the model understood it so well, but in practice it worked. It was a pretty good indicator that we were looking at something different.

Practical Example: Bad vs Good Separation

Mixed (ambiguous)

Rewrite this in a calmer tone and remove references to pricing. We had to raise prices this quarter because costs increased and we are adjusting enterprise tiers.

In this form, instruction and source text are blended, which increases ambiguity.

Separated (clear)

Task: Rewrite the content in a calmer tone.
Constraint: Remove references to pricing.
###
We had to raise prices this quarter because costs increased and we are adjusting enterprise tiers.

This form gives the model a clean boundary: instruction first, content second.

And the same idea could be used to separate chapters and books, etc. Other separators also worked—several dashed lines, rows of equals signs—there wasn’t one hard rule about what worked better than anything else. But I found that three hashtags was consistently the strongest, and when I look back through my prompt examples, it’s the one I used most frequently.

That raised a later question as the models progressed: what’s the best way to present structured data?

Over time, it became common to see different patterns emerge. There was a period when you’d notice differences between models. If you worked with something like GPT-4 versus Claude, Claude tended to do pretty well with XML. It seemed to prefer it. And it seemed like the GPT models preferred Markdown—Markdown being a human-readable format that converts easily into HTML—whereas XML is a more formal format used inside document systems like Microsoft Word.

I also think that, for a while, Claude was stronger with language when it came to writing style than the GPT model, and that suggested something about training. My guess is they were often training on the whole document—including the XML—which is a smart approach. Because if you strip away formatting—line breaks, chapter headings, anything that indicates where something new starts—and you just provide raw text, it’s harder for the model to interpret the structure that’s already there.

Practical Example: Markdown vs XML Framing

Markdown-framed prompt

# Task
Summarize the memo for an executive audience.

## Constraints
- Keep under 120 words
- Preserve key risks
- Use neutral tone

## Source
The rollout missed two milestones due to vendor delays...

XML-framed prompt

<request>
  <task>Summarize the memo for an executive audience.</task>
  <constraints>
    <max_words>120</max_words>
    <preserve>key risks</preserve>
    <tone>neutral</tone>
  </constraints>
  <source>
    The rollout missed two milestones due to vendor delays...
  </source>
</request>

Both can work well; the point is consistent, explicit structure.

But with XML, you implicitly include structure: chapter breaks, paragraph breaks, subheadings, and so on. That gives the model a better sense of what things are and where they are. It’s also a useful signal about what kind of document you’re looking at. Something in XML is probably a written document. Something in HTML might not be. Something in Markdown might be more technical—think README files on GitHub.

And sometimes it matters to delineate early what kind of writing you’re dealing with: technical writing versus more personalized style writing. That was, in my view, one of the reasons Anthropic was on a smart path early on in building a system that was strong across different kinds of writing. I’d say now those differences have largely vanished as models have progressed and people have learned better techniques. But early on, it was an advantage to explicitly treat “not all text as the same.”

What’s interesting is that now OpenAI recommends using XML in some cases when you’re trying to provide specific instructions. Markdown is easier for most people to wrap their heads around; XML is more technical, but it’s been pointed out in documentation as one of the ways to help the model understand structure—particularly with the shift to the GPT-5 series. At the same time, we’ve also seen Markdown embraced by Anthropic for things like working with agents and for how humans naturally explain things to models.

So we’re in this space where both Markdown and XML are useful. But the core idea—delineating between instruction and content—hasn’t gone away.

We also find that sometimes you can get video models and image models to perform better if you give them raw JSON, which is highly structured. Now, is that because those systems “prefer JSON,” or because they prefer really, really structured inputs? That probably varies by model. But even to this day—six years after the launch of GPT-3—the more structured your inputs are, the better your outcomes tend to be, because you’re reducing ambiguity.

Practical Example: JSON for Structured Inputs

{
  "task": "Generate a 20-second product video script",
  "audience": "new users",
  "style": "clear, energetic",
  "must_include": [
    "problem",
    "solution",
    "call to action"
  ],
  "avoid": [
    "technical jargon",
    "unverified claims"
  ]
}

When inputs are this explicit, models spend less effort guessing your intent and more effort solving the task.

Even if a model can usually tell the difference between instructions and content, it still has to stop and think a little bit. And that matters. If you make the model spend effort resolving ambiguity, it’s not using that effort to solve the actual problem you’re trying to solve. And it increases the chance you get a different answer—either because it came to the wrong conclusion about what you meant, or because it simply didn’t spend as much time thinking about the task as it could have.