Product management has changed substance, not surface. The artefacts a PM owns are no longer just roadmaps and specs. The surface still looks the same - sprints, tickets, and the usual reviews. What’s being managed has changed shape and is reshaping itself faster than any prior platform shift.

AI vocabulary doesn't change the craft, and neither do the tools. My first AI features were on a twelve-month roadmap. They were helped by AI; they were not rebuilt around it. Too many roadmaps still look like this. The words were new; the craft was not. For them, and for their leadership, agent, eval and harness remain jargon. They ship updates and report velocity. This is the more expensive failure: the appearance of adaptation without the outcomes.

When the build cost falls by an order of magnitude, the cost of building the wrong thing rises by the same order. In this terrain, judgment, empathy, and principled clarity don’t disappear, they become your edge. The bottleneck is deciding what is worth building, what good means, and where to draw the lines. None of those decisions can be delegated. They are PM decisions, and are heavier than before.

What is genuinely new.

Product management was built for deterministic software. Same input, same output. You scope the feature, ship it, instrument the funnel, and the system behaves. The constraint is technology - speed, ability, cost. Bugs are reproducible. Acceptance criteria can be written as assertions because the machine does the same thing every time you ask. AI systems do not. They return distributions, not values. And their rate of change is unprecedented.

Mobile was a new technology. Cloud was a new technology. Both changed what PMs built.

AI changes what PMs do.

The substrate is probabilistic. Mobile and cloud were new surfaces on top of deterministic substrates. The thing underneath behaved the same way every time, and the new craft was about different interaction surfaces and architectural set-up. AI flips this. The substrate itself is non-deterministic, and every product built on it inherits that property. The same prompt, run twice, gives different outputs, sometimes slightly, sometimes catastrophically. There is no test or settled method for validating probabilistic output deterministically at scale. The variance is the system.

The pace of evolution is different. Compression cycles have continued to collapse, yet mobile and cloud took years to mature. AI’s pace is different in two ways. At the platform level, frontier models reset capability every few weeks. At the product level, the model generates output faster than the team can read. A PM trying to review every output is reading yesterday's responses while today are shipping.

The capability lives inside the product. The internet was transformative. Even so, the product the user touched was still what the team had built and deployed. The internet changed how it reached them, not what it was. Mobile and cloud were platform shifts. The capability sat in the infrastructure, the product on top. With AI, the model is not under the product; it is the product, or close to it. The thing the customer interacts with is what mutates. ChatGPT's behaviour shifted in December 2024 when OpenAI widely released their o1 thinking models - a change felt in output before it was acknowledged. My own Perplexity returns different answers to the same query depending on which model is routing - a decision made by Perplexity, not me. That is new, and most product management frameworks have no language for it yet.

The artefacts have changed.

PMs have to be fluent with the substrate. Three new tools sit at the centre of AI product work. None of them existed three years ago.

Good is now measured, not declared. Scoping describes what the system should do; an eval measures whether it actually does. For deterministic features the two are nearly the same. With AI, the spec is now only the starting point. The eval makes it operational: a rubric the PM owns, an input set the PM curates, a threshold the PM defends. That is the new acceptance criterion, and the PM signs off on it the way a traditional PM signed off on a launch.

What ships is no longer just what is built. The build still happens, code is still compiled, services still deploy. But what determines the user experiences is no longer the build alone. It is the harness: the configured composition of model, prompts, tools, skills, retrieval, guardrails, and fallbacks that ships as one versioned object. Noting growing cost concerns, the harness is also where the product's unit economics are set - a different model, a deeper retrieval, an extra agent step, each a financial decision dressed as a technical one.

Where variance lives is now a design decision. The deterministic-probabilistic boundary is a new design surface. It sits alongside feature scope, as important and newer. Where in the system do we tolerate variance, and where do we refuse it? The payment is deterministic. The rendered product details are probabilistic. The line between them is where the product's reliability is set, and where it sits is a PM call before it is an engineering one. It is one of the most consequential design decisions in the product. The PM who has not drawn it has handed this to someone, or something else.

These three artefacts are the substance of the change. A company whose evals, harnesses and boundaries are sharper than its competitors' will ship better products, faster and more reliably from than competitors paying for the same API. As foundational models commoditise, the moat in AI is moving to the harness around it.

A PM who has internalised them is managing an AI product. A PM who has not is managing the part of the product that hasn't changed, while ignoring the part that has.

Why most teams haven't switched.

Incentive. Leaders still ask for roadmaps, and PMs supply them. They produce roadmaps that make AI features look like SaaS, because that is what reads as competent. The cost is that the roadmap describes a fiction; the actual product is moving in a way the roadmap cannot represent.

Understanding. The new craft is genuinely new. It is being invented in public, by the people building, week by week. There is no canonical textbook or classroom module. The PM trying to learn cannot wait for it. For some, the market has decided PMs should be vibe coding and vibe designing before they have learned the new craft, which is the wrong order. PMs extend into their natural strengths. I have spent more time in Lovable and Figma than in Claude Code. But it should never overshadow customer obsession and the core PM artefacts of AI-native product management.

Anyone trying to close this gap is doing it against performance reviews, OKR cycles, and the language of board reporting, all of which still measure something else.

The asymmetry of adoption is telling. Recent developer surveys put AI tool use among engineers higher and more consistent than PMs. PMs are using AI to draft tickets, decks, and one-pagers. Far fewer are running evals, configuring harnesses, or re-imagining always-on systems to amplify themselves.

The interface is being adopted by everyone; the craft, less so. That is the gap. It is not closing on its own.

What endures.

The questions are older than the discipline. PMs have been answering them since before the word "product manager" existed, and they will be answering them after this cycle of artefacts has been replaced too.

What is worth building? For whom? How do we know it is any good? These are still answered with the same instruments - taste, judgment, restraint, and what Shreyas Doshi calls product sense: the intersection of empathy, curiosity, and knowledge. Customer obsession has not changed. The shape of product work - know the customer, test, build, measure, steward - has not changed either. The way each stage is done has, which is a separate article.

The model is a powerful averager. Given a vague input, it returns the modal answer - competent, plausible, indistinguishable from a thousand others. Given a clarified, opinionated, specific brief, it compounds the brief. The clarity has to come from the human, before the model is asked. PMs who outsource their early thinking will produce more, faster, and converge on the same median product as everyone else using the same tools.

Product sense atrophies the way any muscle does. Through disuse.

Every technology cycle produces two kinds of operators. The first learn the new while keeping the old. The second mistake the new for the a replacement. This craft will be self-taught. The PMs who learn the new artefacts and keep asking the old questions will define the next decade. The rest will be busy with the surface, productive at the wrong thing, faster.


Next: the primitives this work is made of.

Doing the Old Job, Faster.

AI Product Management | The job has changed; the questions have not.