From plausible plan to evidence based
Last of three this week on AI tooling in product development. Post one was the PRD workflow. Post two was using the tools to understand our own system. This one is about what the tools get wrong — and why that turned out to be the most useful part.
Midway through the PRD session I described in my first post, the AI had been reading our codebase and mapping out which parts of the system were built and which still needed work. It kept describing one component as "fully functional" and "already working." Confident tone. Clean summary. The kind of output you nod at and move on.
I asked: "What makes you think it is working at the moment?" Turns out it had found code on both sides — frontend components that make API calls, and backend routes that should serve them. But it had never seen the system running. No test results, no logs. It had inferred from the existence of code that the code must be working.
When I pushed, it said "I don't have evidence that it's working. I have evidence that code exists on both sides, and I made an assumption."
That moment was more valuable than any of the analysis it got right. And it happened more than once. At another point it described a system as "primarily talking to the old API." I asked how it knew. Turned out the routing configuration that would actually answer that question lives outside the repos it had access to — so it had filled in the gap with a reasonable-sounding guess.
I challenged a few other claims and each time it went back, dug deeper, and came back with something more complete and precise.
I think the thing people forget when working with these tools is that the output always sounds authoritative. The prose is clean, the structure is logical, and uncertain conclusions get presented with the same confidence as verified facts.
If you don't bring your own knowledge and quality bar to the conversation, you end up with a document that reads beautifully and is built on guesses nobody questioned.
The PRD we ended up with was significantly more useful than the first version.
Not because I rewrote it — because I kept asking questions and the AI kept revising. Much faster than I ever could.
It went from being a plausible plan to something grounded in actual evidence. But only because someone kept insisting on the critical difference between "code exists" and "code works."


Always brilliant Kendra. This is exactly what I find when pointing AI at code (as I'm doing right now because I have a cold and my partner has abandoned me for the day... humph).
Your experience was with a PRD, but the same pattern shows up in direct engineering or in architecture solutions. It's that point about the output always sounding authoritative, which is the key thing, and it's a bit scary.
Uncertain conclusions are presented with the same confidence as facts. Until that AI gets "better", or you have a second AI as a judge to identify where it might be making assumptions, it speaks to the person working with the AI having some relevant knowledge to spot when it's filling a gap with a plausible-sounding "guess".
It always came back to asking it questions because something had a "smell" that triggered me.
But the bits where it was good: taking a different git project, parsing it to understand the logic, and converting that into a Mermaid sequence diagram so I could understand and use the API correctly, and then feeding it my client API logs. That stuff is a lifesaver.