Classes from Constructing a First-Cross AI PRD Reviewer at Uber


Most product organizations have some model of a overview course of. Sometimes, as soon as PMs have an early draft of a PRD (Product Requirement Doc) prepared, it’s circulated throughout design, engineering, authorized, operations, science, and product management. That course of is designed to enhance high quality and scale back danger. In follow, it usually reveals a tougher actuality: PMs may be making selections in techniques the place the related context extends far past what anyone individual can simply assemble on their very own.

A PRD might attain the overview stage with an unsupported headroom assumption, a blind spot in how the characteristic might have an effect on adjoining techniques, an unexamined second-order impact, or a policy-sensitive change with out the guardrails reviewers count on. In different circumstances, the staff could also be unknowingly revisiting a speculation that was already explored in a smaller experiment or adjoining effort, however the related context is scattered throughout docs, decks, dashboards, and institutional reminiscence.

At that time, the overview course of tends to pivot to lower-level discovery work: surfacing adjoining impacts, reconstructing prior context, and figuring out questions that’d been extra helpful to handle earlier. That slows groups down, consumes reviewer consideration on points that might have been surfaced earlier, and makes suggestions inconsistent.

The actual downside isn’t that PMs lack rigor. It’s that product work usually requires a 360-degree view that’s troublesome to assemble manually within the second: adjoining impacts, accomplice considerations, prior experiments, hidden dependencies, and the questions senior reviewers are prone to ask.

That was the issue we got down to remedy.

Why This Issues at Uber

At Uber, product improvement runs via a structured checkpoint course of that offers management and cross-functional groups visibility, accelerates approvals, and drives constant execution. However a checkpoint course of is just as efficient as the standard of the supplies getting into it.

We noticed a possibility to strengthen that workflow additional by serving to PMs floor vital questions earlier. Slightly than altering the checkpoint course of itself, the objective was to enhance the standard of what entered it.

That led us to a easy query, and finally to the PRD Evaluator: what if each PM had a quick, contextual first-pass reviewer earlier than a PRD reached the broader approval course of?

Position of the AI-Powered PRD Evaluator

The PRD Evaluator is an AI-powered reviewer that begins with a PRD and assembles a broader data base round it: linked paperwork, associated decks and assembly notes, prior experiments, cross-functional artifacts, and preloaded Uber-specific context like core rules, metric definitions, and key jobs to be carried out. It makes use of that context to return a structured evaluation of launch readiness.

Its function is intentionally centered: strengthen the PRD earlier than it reaches high-cost overview boards. To not substitute senior judgment, however to assist groups enter these conversations with stronger context and fewer avoidable gaps. It sits upstream of the approval system and improves the standard of what enters it.

For us, that meant constructing a system that helps PMs do a number of issues earlier and higher:

  • Determine an important gaps in a draft
  • Floor adjoining impacts and cross-functional dependencies
  • Uncover prior learnings that will not be apparent to the present staff
  • Enter checkpoint and overview boards with a stronger artifact

How It Works: 4 Steps From Draft to Actionable Scorecard

We didn’t need a generic writing software that merely rewarded polished prose. A PRD may be well-written and nonetheless miss the context, framing, or choice logic that determines whether or not it’ll maintain up in overview.

Step-by-step process for evaluating a PRD: share a PRD link, gather context from related documents, evaluate across dimensions, and receive a scorecard with ratings and action items. Each step is represented by an icon and brief description on a dark background.

Determine 1: Overview of how the PRD Evaluator works. 

1. Construct a Broader Data Base Across the PRD

The evaluator makes use of the PRD as an entry level, then harnesses AI to go looking throughout related firm artifacts and linked materials to assemble the context wanted to evaluate the choice effectively: associated paperwork, prior experiments, cross-functional inputs, and preloaded Uber-specific context.

2. Classify the PRD to Calibrate Assessment Depth

Not each PRD wants the identical scrutiny. The evaluator classifies every proposal and calibrates accordingly:

  • Lighter overview for UX parity or discoverability adjustments
  • Reasonable overview for incremental workflow adjustments or inside tooling migrations
  • Full overview for net-new capabilities
  • Full overview with specialised scrutiny for coverage, pricing, or market adjustments

3. Assess Launch Readiness Throughout A number of Dimensions

The overview is structured round a number of dimensions together with:

  • Alternative and Speculation: Is the issue actual, and is success outlined clearly sufficient to guage?
  • Product Scope: Is the proposal comprehensible, well-scoped, and decision-ready?
  • Consumer Expertise and Affect: Does the expertise work effectively throughout consumer segments, geos and potential edge circumstances?
  • Metric and Knowledge Rigor: Does the PRD outline success, guardrails, and a reputable validation method?

4. Produce a Scorecard Constructed for Motion

Slightly than a wall of feedback, the evaluator produces a structured scorecard:

  • A launch-readiness ranking
  • Dimension-by-dimension assessments
  • A transparent “begin right here” pointer to an important repair
  • For every hole, share what’s lacking, present write-ready alternative textual content recommendations, and proof from linked docs or prior experiments
  • Prioritized motion objects break up into important necessities and optimizations

The output is designed to do greater than level out weaknesses. It’s meant to make the subsequent spherical of revision simpler and extra focused, and the subsequent overview dialog increased sign.

Four-section summary of deliverables: Launch Readiness Rating (with statuses Ready, Ready with Caveats, Not Ready), six Dimension Scores (rated Looks Good or Needs Review), Detailed Findings & Fixes (including replacement text and evidence), and Action Items (Critical Requirements and Optimizations).

Determine 2: Abstract of the PRD Reviewer output format. 

Determine 3: Illustrative scorecard instance.

The place the Worth Exhibits up for PMs

The most important worth is that it adjustments the standard and timing of product considering.

It Expands a PM’s Area of View

Lots of the hardest product errors come from incomplete visibility. A PM might not know {that a} comparable speculation was examined earlier by one other staff. They might not understand a metric is ambiguous or lacking an apparent guardrail. They might not see a downstream operational dependency as a result of it sits outdoors their instant product floor.

A very helpful evaluator expands that area of view. It could join a draft to prior artifacts, adjoining efforts, pre-existing hypotheses, and lacking questions, to which the creator has entry, that’d in any other case depend upon another person remembering them in a gathering. It could additionally floor context that was by no means explicitly linked within the PRD however remains to be related to understanding the choice.

It Makes Self-Assessment Extra Structured

Most PMs can inform when a doc feels weak. The tougher query is why it’s weak and what to repair first.

The evaluator makes that prognosis extra specific. As an alternative of obscure unease, the PM will get a structured view of lacking fundamentals: unsupported headroom assumptions, undefined guardrails, blind spots in how a change might have an effect on adjoining techniques, or dangers that want acknowledgement.

It Improves the High quality of Assessment Rooms

When a PRD reaches a reviewer in higher form, the dialogue strikes sooner towards tradeoffs, prioritization, and judgment, and fewer time is spent recovering context. That’s the place the evaluator connects most on to Uber’s product improvement system.

It Turns Critique Into Usable Revision

Crucial design selection within the system wasn’t scoring. It was guaranteeing actionability.

PMs don’t profit a lot from feedback like “be extra particular” or “assume via draw back danger”. The evaluator is most helpful when it converts critique into revision steerage: outline the baseline, title the goal, add the guardrail, scope the primary launch extra narrowly, acknowledge the danger, or make the dependency specific.

That adjustments the workflow from passive critique to energetic enchancment.

Early Adoption

Early utilization validated the core worth: the evaluator helped IC PMs uncover blind spots early, pressure-test unsupported headroom assumptions, floor how a proposed change might have an effect on adjoining techniques that weren’t core to their function, and establish expertise enhancements inside the scope that they had already outlined.

In early inside utilization, the evaluator has already been utilized by dozens of PMs throughout Uber.

The software’s worth reveals up when PMs can deliver it into their regular drafting and overview workflow, strengthen the constancy of what enters overview, and assist reviewers give attention to higher-signal questions.

What We Discovered

Just a few classes stood out as we constructed and examined the evaluator:

  • Frameworks beat generic critique. Broad feedback not often assist groups transfer sooner. The leverage comes from a framework tied to precise choice standards and failure modes.
  • Context issues as a lot as language high quality. Many vital alerts dwell outdoors the PRD itself, and richer context usually reveals a distinct set of blind spots than the doc alone.
  • Onerous boundaries make output extra trustworthy. Defining a small set of important gaps helped the evaluator keep away from calling a PRD review-ready when the basics had been lacking.
  • Prioritization is a part of the product. A overview software that flags all the things as vital isn’t serving to. The evaluator’s worth comes from telling PMs what to repair first.
  • The very best AI output improves human conversations. The strongest signal the evaluator was working was that later overview discussions turned sharper and sooner.

The place Human Judgment Nonetheless Issues

The evaluator doesn’t intention to make remaining handbook approval selections or substitute area specialists. The software is most helpful when it strengthens the artifact earlier than professional overview.

The toughest a part of product improvement is getting the precise folks to make the precise selections on the proper time, utilizing an artifact sturdy sufficient to assist these selections.

Most product organizations have some equal of checkpoints, overview boards, or gated approvals. The names differ, however the problem is identical: how do you ensure that the artifact getting into the method is robust sufficient for the method to do actual work?

AI has actual leverage right here as a structured thought accomplice that expands context, surfaces blind spots, and sharpens judgment earlier than a choice reaches a high-cost discussion board. That’s the reason we constructed the PRD Evaluator. And primarily based on what we’ve seen up to now, we expect this sample (AI that strengthens the enter to human decision-making) will matter effectively past one firm or one software.

Acknowledgments 

Cowl Picture Attribution: Created by Gemini

Scorecard Pictures Attribution: Created by Claude