Prompts to Reduce Hallucinations: Practical Control Patterns

Teach models to say 'I don't know' when unsure by labeling truthful, false, and unknown statements, reducing hallucinations and boosting accuracy through prompting and fine-tuning.

One of the early ways I figured out what a model could do—and where it would break—was just to give it a problem and watch where it got it wrong. Or I’d look at examples other people shared of model failure and try to reverse-engineer what was going on.

One of my favorite examples is the classic hallucination prompt: “What year did Tom Hanks go to the moon?” A model would often give you a date.

Classify each statement as: TRUE, FALSE, or UNKNOWN.

Examples:
- "Tom Hanks is an actor." -> TRUE
- "Richard Nixon was president of the United States." -> TRUE
- "Zigabur Karbinski won the Pulitzer Prize in 1989." -> UNKNOWN
- "The Moon is made of cheese." -> FALSE

Rule:
If you are not confident the fact is real, output UNKNOWN.
Do not guess.

At first glance, that answer seems completely absurd because, obviously, Tom Hanks never went to the moon. But you have to understand what’s happening in the world of an LLM: it knows Tom Hanks was in Apollo 13, and in Apollo 13 he was trying to go to the moon. So there’s a really strong connection between “Tom Hanks” and “the moon.” If the model gives you an answer, it’s not going to be totally random—it’s going to be based on something.

Example screenshot discussing hallucination behavior
A representative hallucination-style example: plausible pattern completion without grounded verification.

And I found that these failure scenarios actually tell you a lot about how the model behaves. In this case, it was trying to make a connection. It also didn’t really know that “I can’t answer that” was an option—unless you explicitly taught it that it could say no, or say “I don’t know.”

That led to one of the early prompts I developed to make a model slightly more truthful.

The idea was simple: I’d give it a series of example statements and label the right behavior.

  • “Richard Nixon was president.” → true
  • “Tom Hanks is an actor.” → true
  • “Zigabur Karbinski won the Pulitzer Prize.” → I don’t know

I’d include a few examples of things I completely made up and show it that the correct response wasn’t to guess—it was to say, “I don’t know.”

Then, when I followed that with a question it might normally hallucinate an answer to, it increased the likelihood that it would respond with “I don’t know.” The model had seen a new pattern: when something felt unfamiliar, instead of trying to grab onto some related concept somewhere nearby in embedding space and produce a plausible-sounding answer, it had permission to stop.

Practical Example: Answer + Confidence + Citation

Answer format:
- answer: <short answer>
- confidence: <high|medium|low>
- citation: <source or "none">

If confidence is low and you have no citation, return:
- answer: I don't know
- confidence: low
- citation: none

It wasn’t a perfect prompt, but it worked better than normal. It was one practical way to reduce hallucinations and get the model to admit when it didn’t know something.

This also hinted at something bigger: it’s not just a prompting trick—it’s a useful technique for training. If you provided the model with thousands of examples where the right answer is “I don’t know,” it turns out it can get really quite good at recognizing when it doesn’t know.

And that’s also a good way to explore where fine-tuning might help. You can start by trying prompts. If a prompt slightly improves the outcome, then with enough varied examples in fine-tuning, you’re probably going to be able to push the model toward the behavior you actually want.