Tool Makers vs Tool Users: Where Product Value Actually Lives

The key is removing friction and focusing on user usability, not just expanding capability, to achieve real AI adoption.

One of the things I’ve noticed is that when you start working on tools—whether it’s AI, coding, programs, applications, whatever—it can become surprisingly hard to think of yourself as a tool user when you’re a tool maker.

When you’re building tools, your brain goes to all the things you can make. All the features you can add. All the cleverness you can ship. But that’s not the same experience as being a tool user, sitting down with the thing on a random Tuesday, trying to get something done. And what a tool user needs is often different than what a tool maker thinks they need.

ChatGPT is the cleanest example I know of this.

ChatGPT launched in November 2022. It was really a research preview. I was in those meetings and we genuinely had no idea it was going to blow up like it did. People sometimes find that hard to believe because they look at it and go, “Look how incredible it is. Look how useful it is. How could you not know?”

But what people forget is that ChatGPT came out in November 2022, and the base model it was built on—GPT‑3.5—came out in March 2022.

So you had a model with essentially the same core capabilities out there for six months.

Inside the company, we thought GPT‑3.5 was really cool. I blogged about it. I showed examples. To me it felt like a big leap over GPT‑3—honestly, it was probably what many people would have expected GPT‑4 to be capable of. GPT‑3.5 was OpenAI basically asking: if we could rebuild GPT‑3 knowing what we know now, what would we do differently?

So why did ChatGPT change everything if it wasn’t a brand new, smarter base model?

Because what made ChatGPT special wasn’t raw capability. It was that it was post‑trained on examples of how to behave like an assistant.

A base model is trained on a raw body of text—just a giant pile of language. And the model, as if by magic, learns language, learns communication, learns patterns. That’s why LLMs are amazing. But by themselves, they require someone like me: someone who is willing to spend time figuring out how to prompt it, how to steer it, how to coax it into doing the thing you want.

I’d argue prompting was actually pretty easy. But even if it’s easy, it’s still friction. And for a lot of people, it’s not worth their time.

ChatGPT removed that friction.

It took the same underlying model and wrapped it in training that basically taught it: when a user asks something, answer like a helpful assistant. The user asks a question, the model knows the job: provide an assistant response.

In fact, capability-wise, ChatGPT wasn’t really “more capable” than GPT‑3.5, because it was the same model. You could even argue it lost certain capabilities in post‑training, because post‑training tends to steer models away from some behaviors and into others.

But from the user’s perspective, none of that mattered. What mattered was that it finally felt usable.

A couple other things helped too:

  1. It was free.
  2. There was basically no friction to try it—no subscription decision, minimal barriers, just open the page and use it.

So the funnel went from “You need to be curious, technical, and patient” to “Just type what you want.”

Here are a few examples from those early days:

Write a letter.
Write a blog post.
Here’s some copy—make it sound better.

No tricky prompting. No learning curve. No needing to think about how the model “thinks.”

And we were all caught off guard.

GPT‑4—the model that was supposed to be the big groundbreaking release—wasn’t going to ship until March 2023. And here we were, months earlier, and the world had already been blown away.

It didn’t take a smarter model. It took less friction.

To underscore how non-obvious this can be: internally, one of the founders—this is a matter of record—Ilya tried ChatGPT and didn’t think it was good enough and didn’t want to release it. And I get it. He has a very high standard.

But for many people, “good enough” is incredibly valuable. If your bar is “help me write a better email” or “fix my grammar” or “help me get unstuck,” ChatGPT was more than good enough. It was magical.

And that’s the point: we often don’t know where the friction is.

If you’re curious, go look up the early Dropbox story about how they found what was blocking user engagement. The TL;DR is that they finally had to bring in some normal people—some normies—and just watch them try to use the product. The team had taken a bunch of things for granted, and users kept failing in ways the builders couldn’t even imagine.

I think that’s where we are right now with AI.

There’s a lot of incredible capability. But it’s not easily explainable to people. And we haven’t eliminated the friction.

Which brings me to the current example: agents.

I think a clear example of where there’s a lot of friction today is agentic tools.

OpenAI released a tool a while ago called Operator. I loved it—because I understood what it could do, and I could get it to do useful work. But it also fell apart in ways that were deal-breakers. Something as simple as looking at a Google Doc and not knowing how to scroll down: that’s the kind of failure that makes a normal user say “this is broken” and never come back.

To me, it was obvious that this is the sort of thing you can train into the model. The more time it spends using the same tools humans use, the more it can learn those basic interactions—much like a person does.

The latest version of this idea, for me, is ChatGPT Atlas—OpenAI’s browser. I use it all the time. It’s huge. It’s a game changer for me. I use it for all kinds of things.

There are other agentic browsers out there too—tools that weirdos like myself love—but they haven’t really had their “ChatGPT moment” with normies yet.

I think we’re heading toward that moment. It’ll happen once the capabilities are surfaced in a way people can understand, or the tools get good enough that when you sit down and ask it to do something reasonable, it just does it.

People ask me, “What do you even use it for?” And the honest answer is: almost anything tedious I have to do on the web.

Writing a bunch of emails. Cleaning up a spreadsheet. Filling out forms. All the little pieces of work that aren’t hard, just annoying.

I go into Atlas and use it. But my experience is friction-free largely because it feels like the early GPT‑3 days: I understand how it works and (more importantly) what it can’t do. So I’m not constantly trying tasks that are just outside its current limits.

With Operator, for example, if I asked it for a list of more than ~100 things, it might not do it. And I knew that, so I could work around it. But a normal user won’t. A normal user will say: “I asked for a list. It didn’t give me the list. This tool sucks.”

Now we’ve seen agentic tools improve considerably. OpenAI’s Codex app is a great example—it’s a phenomenal agentic app, and I can get it to do a lot of the things I wanted Operator to do. And as the models that power that improve, and as those learnings make their way into tools like Atlas, these tools will get better.

But I think the big problem right now is that the capability is jagged.

If somebody tries an agentic tool and it fails on a task that feels basic—filling out a form, navigating a site, crossing a login wall, dealing with a weird UI—they won’t think “oh, I hit an edge case.” They’ll think “this doesn’t work,” and they’ll move on.

Every time you do a thing it should do, and it doesn’t, it makes you not want to use it.

So adoption depends on two things:

  1. Making the underlying models more capable (so fewer jagged edges).
  2. Being clearer about what the tool is for (so users don’t constantly walk into failure modes).

ChatGPT’s killer application was language: emails, writing, rewriting, documents. Atlas, as an interface to the entire web, is trying to cover a much bigger surface area. That makes it harder to explain and harder to make reliable.

But I do think we’ll get there. As we define what agentic browsers are actually useful for—especially for normies—and as reliability improves, the use cases will crystallize.

I also have a theory about how those use cases will really land.

I think we’re going to define what these tools are used for through applications.

A lot of people imagine one app that does everything. OpenAI is exploring a different approach: ChatGPT connects into a bunch of different apps. And that might be the way this becomes real for people.

Because personally, I would rather have an email app that knows what ChatGPT is doing than do my email inside ChatGPT. I suspect that’ll be true for a lot of workflows. In many cases, we don’t actually need to do the work “in a browser” at all—we need the right product surface, with AI baked into it in a way that reduces friction instead of adding more.

That’s the thread connecting all of this for me.

We keep thinking the next breakthrough is going to come from a smarter model.

Sometimes it will. But a lot of the time, the breakthrough is going to come from removing friction—especially the friction tool makers don’t even notice.