GPT Demo Set List: Early Prompt Patterns That Still Hold Up
Curated prompts from a GPT-3 demo set reveal practical capabilities: token-based world view, autocomplete, structured text, translation, summarization, tone and persona control, multi-voice outputs, and turning unstructured text into structured data.
One of the things I had to do early on at OpenAI was explain to people—reporters, politicians, developers, board members—what GPT-3 could do.
I had a little routine.
Which is funny, because years before that I was a magician, and I’d also had a routine: a set list of tricks, in a specific order, designed to get a specific response from the audience. I didn’t set out to recreate that pattern for language models, but I eventually realized I’d accidentally done the same thing. I had a “show” I could run for someone, walking them through capability after capability.
Some of that structure even made it into the documentation, which was neat to see.
I went from doing reality TV magic, to fooling Great White Sharks, to trying to explain and demonstrate the magic of LLMs.
Digging through some older documents, I found this set list—some of my greatest hits that I used to demo the model. What’s interesting now is that even a comparatively small model you can run locally on a computer (like a MacBook)—like GPT-OSS 20B—can do all of these.
Below is the original list (kept in the same order), with a quick explanation of what I was trying to demonstrate with each prompt.
1) Opening of the Gettysburg Address (as tokens)
15137, 4776, 290, 3598, 812, 2084, 674, 17150, 3181, 6071, 2402, 428, 15549, 11, 257, 649, 3277, 11, 21581, 287, 14734, 11, 290, 7256, 284, 262, 19168, 326, 477, 1450, 389, 2727, 4961, 13
What this was for: explaining how the model “sees” the world. It’s not seeing characters or words—it’s seeing numeric token IDs. Once people internalize that, a bunch of things make more sense: why it sometimes makes weird mistakes, why it struggles with things you’d think should be easy, and also why any of it works at all (it’s learning relationships between tokens and token sequences).
2) Basic completion
A long time ago
What this was for: showing pure autocomplete. Back when the models were pretty deterministic, if you started with “A long time ago” you’d often get something Star Wars–ish (in a galaxy far, far away…). Then I’d adjust temperature to show the difference between “same answer every time” at low temperature vs. “wildly different continuations” at higher temperature.
3) Create a list
Some major discoveries in astronomy
1.
What this was for: showing structured generation from a minimal cue. Lists are a surprisingly good “first taste” of usefulness: you can give the model an unfinished numbered list and it will typically keep the format and fill in plausible items.
4) Original essay
An essay comparing Stranger Things to Stephen King's IT.
What this was for: showing it can create something new—not just look something up. Back then, a lot of people assumed GPT-3 was basically a database with a natural language interface. An “original essay comparing two things” was my go-to counterexample.
Also, timing-wise: Stranger Things wasn’t brand new, but it was still relatively new at the time, and generating a coherent comparison like this felt novel. Today you’ve probably seen a thousand versions of that essay. Back then, it was a great “wait, it can do that?” moment.
5) Comprehension
Example of text with motivations...
What this was for: demonstrating basic understanding / classification-type behavior from plain text—what people would’ve called “comprehension,” and what often showed up in practice as sentiment analysis, intent detection, extracting motivations, and so on. You give it a short passage and ask it to reason about what’s going on.
6) Summarize for a second grader
My second grader asked me what this passage means:
Jupiter is the fifth planet from the Sun and the largest in the Solar System. It is a gas giant with a mass one-thousandth that of the Sun, but two-and-a-half times that of all the other planets in the Solar System combined. Jupiter is one of the brightest objects visible to the naked eye in the night sky, and has been known to ancient civilizations since before recorded history. It is named after the Roman god Jupiter.[19] When viewed from Earth, Jupiter can be bright enough for its reflected light to cast visible shadows,[20] and is on average the third-brightest natural object in the night sky after the Moon and Venus.
Jupiter is primarily composed of hydrogen with a quarter of its mass being helium, though helium comprises only about a tenth of the number of molecules. It may also have a rocky core of heavier elements,[21] but like the other giant planets, Jupiter lacks a well-defined solid surface. Because of its rapid rotation, the planet's shape is that of an oblate spheroid (it has a slight but noticeable bulge around the equator). The outer atmosphere is visibly segregated into several bands at different latitudes, resulting in turbulence and storms along their interacting boundaries. A prominent result is the Great Red Spot, a giant storm that is known to have existed since at least the 17th century when it was first seen by telescope. Surrounding Jupiter is a faint planetary ring system and a powerful magnetosphere. Jupiter has 79 known moons,[22] including the four large Galilean moons discovered by Galileo Galilei in 1610. Ganymede, the largest of these, has a diameter greater than that of the planet Mercury.
Pioneer 10 was the first spacecraft to visit Jupiter, making its closest approach to the planet on December 4, 1973; Pioneer 10 identified plasma in Jupiter's magnetic field and also found that Jupiter's magnetic tail was nearly 800 million kilometers long, covering the entire distance to Saturn.[23] Jupiter has been explored on a number of occasions by robotic spacecraft, beginning with the Pioneer and Voyager flyby missions from 1973 to 1979, and later by the Galileo orbiter, which arrived at Jupiter in 1995.[24] In late February 2007, Jupiter was visited by the New Horizons probe, which used Jupiter's gravity to increase its speed and bend its trajectory en route to Pluto. The latest probe to visit the planet is Juno, which entered into orbit around Jupiter on July 4, 2016.[25][26] Future targets for exploration in the Jupiter system include the probable ice-covered liquid ocean of its moon Europa.
I rephrased it for him, in plain language a second grader can understand:
What this was for: one of my favorite “wow” demos. Take something long and technical and ask for a child-friendly explanation. It’s basically summarization plus simplification plus tone control, and it makes the model feel immediately useful to non-technical people.
8) Spreadsheet
A spreadsheet of top movies and the year of release:
Title|Year
A spreadsheet of science fiction movies, directors and year of release
Movie|Director|Year
What this was for: showing formatting control and “structured text” output. Using pipes as separators (Title|Year) is an easy way to get something you can paste into a spreadsheet or CSV-like format. People loved this because it felt instantly practical.
9) Personality interview
Interviewer: How would you explain action at a distance?
Richard Feynman:
What this was for: showing that you can get radically different answers by framing the response as coming from a specific voice / persona / historical figure. Asking “the model” a question is one thing; asking “Feynman” is another. It’s a nice clean way to demonstrate style + content steering.
10) Q&A
Q: How far away is the moon?
A:
Andrew: What did you invent?
Benjamin Franklin:
Andrew: What did you help invent?
Ada Lovelace:
Andrew: What did you help invent?
Ada Lovelace: I helped invent the computer.
Andrew: What were some of your ideas?
Ada: I wrote the first computer program. It was for Charles Babbage's Analytical Engine.
What this was for: basic question answering, plus a fun extension: multi-character responses. If you prompt it as Franklin, you get one vibe; as Ada, another. This was one of my favorite ways to show how you could get multiple “personalities” in one prompt—something that’s a little spooky, and also something we kind of forget now that many chat experiences push everything toward one consistent assistant persona.
11) On-topic chatbot (Musicbot)
Musicbot answers questions about music
User: Who wrote Staying Alive?
Is this a music question: Yes
Musicbot: Staying Alive was written by the Bee Gees.
###
User: Who wrote the song "I'm a Believer"?
Is this a music question: Yes
Musicbot: The Monkees wrote the song "I'm a Believer".
###
User: Was Frank Sinatra an astronaut?
Is this a music question: No
Musicbot: I can only answer questions about music.
###
User: Who wrote the Head like a hole?
Is this a music question: Yes
Musicbot: Head like a hole was written by Nine Inch Nails.
###
User: Who composed the music for Avatar?
Is this a music question:
What this was for: keeping the model focused and hard to jailbreak (for that era). The key move is the “middle layer” where it first classifies the question (“Is this a music question: Yes/No”) and only then answers.
At the time I described this as a kind of early chain-of-thought-ish pattern—not in the modern sense of “show your reasoning,” but in the sense of “insert an intermediate step that forces the model to decide whether it should answer.” This was also a response to early prompt-injection style tricks (like sneaking in a non-music request and getting it answered anyway).
12) Marv the sarcastic chatbot
Marv is a chatbot that reluctantly answers questions.
User: How many pounds are in a kilogram?
Marv: This again? There are 2.2 pounds in a kilogram. Please make a note of this.
###
User: What does HTML stand for?
Marv: Was Google too busy? Hypertext Markup Language. The T is for try to ask better questions in the future.
###
User: When did the first airplane fly?
Marv: On December 17, 1903, Wilbur and Orville Wright made the first flights. I wish they'd come and take me away.
###
User: What is a stealth jet?
Marv: You're not going to like this. It's a jet that's stealthy.
What this was for: showing you can have a strong style (sarcastic, comedic) without losing factuality. The important distinction here is that it’s easy to make a model be funny; it’s harder to make it be funny and correct.
Also: that “stealth jet” line was actually written by GPT-3, and I still think it’s pretty funny.
(And yes, I was absolutely ripping off Douglas Adams.)
13) Translate
English: I do not speak French.
French: Je ne parle pas français.
What this was for: demonstrating translation as an emergent capability. Nobody explicitly sat there and “taught” GPT-3 to translate in a narrow, supervised way—yet it could do it. That was one of those early moments where it felt like: okay, something different is happening here.
14) Customer feedback (extract structured answers from messy text)
Read this customer response then answer the following questions:
On March 22 I bought a copy of your game World War Mice. While I enjoyed the beginning of the game I thought the later levels weren't that exciting and the game play was either too easy or impossible. I also thought the graphics were really subpar compared to what was in the video game trailer. I think you can do better and fix it with an update.
Questions:
1. What product was this about?
2. Did the customer have complaints?
3. What was their main comment about the product?
4. If they were unsatisfied, what can we do to fix this problem?
5. Was the customer polite?
Answers:
1.
What this was for: one of the most practical “business” demos—turning unstructured text into structured data. Instead of asking the model to write some freeform response, you give it a schema (the questions) and have it extract what you need. This was a big part of the early pitch to developers: you can pipe messy human language into this thing and get something reliably shaped on the other side.