Small Capabilities, Big Ramifications in Prompt Design

Expanding capabilities such as larger context windows and structured representations like arrays unlock significant practical gains, enabling handling of large codebases and the creation of more complex games.

Sometimes it can be little things—little capabilities that don’t get much attention at the time—that suddenly end up having much bigger ramifications later on.

Other times it’s something we already want, and then we discover it doesn’t just do the useful thing we were hoping for; it opens up an entirely new universe of opportunities.

A good example of that second category is increasing context size.

When you first start thinking about what happens if I can increase the size of context and put more text into prompts, it’s easy to land on the obvious benefits. Okay, great: I can paste in bigger articles and have it summarize larger pieces. I can search through longer documents. Maybe I don’t need to use a database for certain tasks. That kind of thing.

But what I’ve seen—especially in code—is that bigger context changes the game in ways that aren’t obvious at first. It means I can throw much larger chunks of code into the context, sometimes entire codebases. And as the models get better at reasoning through the whole context (not just doing needle-in-a-haystack search), they become able to do a lot more than we expected.

At this point, I can take an entire novel I’ve written, put it into the model, and ask: “Who are the characters?” “Are there any loose ends?” That still feels kind of crazy to me, coming from the days when I could put maybe a thousand words into GPT-3 and hope it would answer a few questions. Now it can process something like a hundred thousand words and give genuinely interesting insights.

But there’s another pattern I’ve noticed: sometimes a capability looks small, almost boring, and then you realize it has all these other applications.

With GPT-3.5, I noticed it could handle arrays—brackets where we put information or sequences of information. It could handle arrays of arrays. That might not sound like a big deal, but it was significant to me because it meant the model could start to understand things like two-dimensional space.

And that led to a pretty big leap from GPT-3 to GPT-3.5 when it came to creating games.

I built a number of games with GPT-3.5 that would have been basically impossible with GPT-3, like a very minimal version of The Legend of Zelda and others. A lot of games use scene descriptions that are literally arrays of dots or numbers or other values. If the model can’t reliably represent and manipulate that kind of structure, the whole thing falls apart. If it can, suddenly you can do a lot.

That was one of the moments where I realized we were moving quickly.

I loved working on GPT-4, and the launch was an exciting thing to be a part of. But seeing those improvements in GPT-3.5—both with arrays and with larger context—really made it clear to me that we were on a trajectory where things were going to keep getting better and better.