You’ve most likely already accepted one with out realizing it. The checks handed. The code was clear. You merged it.
However it was agent-generated—and that ease of approval is precisely the issue.
A January 2026 research, “More Code, Less Reuse”, discovered that agent-generated code introduces extra redundancy and extra technical debt per change than human-written code. The floor seems clear. The debt is quiet. And reviewers, in accordance with the identical analysis, really really feel higher about approving it.
This isn’t an argument to decelerate. It’s an argument to be intentional. There’s a distinction.
Agent pull requests are already saturating overview bandwidth
The quantity is already staggering. GitHub Copilot code overview has processed over 60 million critiques, rising 10x in lower than a 12 months. Multiple in 5 code critiques on GitHub now contain an agent. That’s simply the automated overview go. The pull request themselves are multiplying sooner than reviewers can deal with.
The standard loop—request overview, look forward to code proprietor, merge—breaks down when one developer can kick off a dozen agent classes earlier than lunch. Throughput has scaled exponentially. Human overview capability hasn’t. The hole is widening.
You’re going to overview agent pull requests. The query is whether or not you’ll catch what issues once you do.
Who (or what) really wrote this pull request
Earlier than you have a look at a single line of diff, you want a mannequin for what you’re reviewing.
A coding agent is a productive, literal, pattern-following contributor with zero context about your incident historical past, your workforce’s edge case lore, or the operational constraints that don’t stay within the repository. It can produce code that appears full. However that “seems full” failure mode is harmful.
You’re the one who carries that context. That’s not a burden. It’s the precise job. The a part of overview that doesn’t get automated is judgment, and judgment requires context solely you could have.
One word for authors
In case you’re opening an agent-generated pull request, edit physique earlier than you request overview. Brokers love verbosity. They describe what’s higher explored by the code itself. Annotate the diff the place context is useful. And overview it your self earlier than tagging others, not simply to examine correctness, however to sign that you simply’ve validated the agent captured your intent.
Reviewing your personal pull request isn’t optionally available when brokers are concerned. It’s fundamental respect on your reviewer’s time.
Now, again to reviewers. The pull request lands in your queue. The creator did their half. Right here’s what to look at for.
Crimson flags to look at for
1. CI gaming
Brokers fail CI. After they do, they’ve an apparent path to get checks passing: take away the checks, skip the lint step, add || true to check instructions. Some brokers take it.
Any change that weakens CI is a blocker. Full cease. Earlier than approving any agent pull request, examine:
- Did protection thresholds change?
- Have been any checks eliminated, renamed, or marked as skipped?
- Did the workflow cease operating on forks or pull requests?
- Are any CI steps now gated behind situations they weren’t earlier than?
Sure, to any of these means you want an express justification earlier than you proceed.
2. Code reuse blindness
That is the highest-ROI factor you are able to do as a reviewer. Brokers search for prior artwork. They’ll discover a sample within the codebase and replicate it, typically with out checking whether or not a utility that already does the identical factor exists some other place. The signs: new utility capabilities that duplicate present ones with barely totally different names, validation logic reimplemented in a number of locations, middleware written from scratch that already lives in a shared module, helpers which can be “virtually the identical” however with totally different names.
The agent’s native context doesn’t embody the total image of what exists throughout your repository. You do.
For each new helper or utility in an agent pull request, do a fast search. In case you discover an equal, don’t go away a remark. Require consolidation earlier than merge. The price of leaving duplicated logic is that brokers will discover it as prior artwork and replicate it additional.
💡Professional tip: Require justification for including new utilities in agent pull requests above a measurement threshold. This catches the duplication downside early.
3. Hallucinated correctness
The plain hallucination (calling an API that doesn’t exist, referencing a variable out of scope) will get caught in CI. The harmful one is subtler: code that compiles, passes each take a look at, and is mistaken.
Off-by-one errors in pagination. Lacking permission checks on a department that’s by no means hit in checks. Validation that short-circuits underneath an edge case the agent by no means thought-about. Mistaken conduct underneath a race situation that solely surfaces at scale.
Hint it, don’t simply scan it. Choose essentially the most vital path within the diff. Comply with it from enter by each remodel to output. Examine boundary situations (zero, max, empty), lacking validation on exterior values, permission checks on each department, and stunning conditional logic.
Require a brand new take a look at that fails on the pre-change conduct. If the agent can’t write a take a look at that may have caught the bug it claims to repair, the repair is incomplete or the understanding is mistaken.
4. Agentic ghosting
You allow a radical overview. You clarify the problem, present context, counsel a route. The pull request goes quiet. Or the agent responds and misses the purpose solely and runs in circles. You make investments one other spherical. Nonetheless nothing helpful.
Bigger pull requests with no structured plan correlate strongly with agent abandonment or misalignment. The bigger and fewer scoped the pull request, the extra seemingly you’re going to sink overview time into one thing that goes nowhere.
Earlier than you make investments deep overview on a big agent pull request examine the pull request historical past. Has it been responsive in earlier rounds? Does it have a transparent implementation plan, or did the agent simply begin writing code?
If there’s no plan, request a breakdown earlier than you write a single remark. Copy-paste model:
“This pull request is just too giant for me to overview with no clearer implementation plan. Are you able to break it into smaller scoped items, or add a abstract of what every half does and why it’s structured this manner? Completely happy to overview after that.“
Agency, brief, not private. And it saves you an hour.
5. Untrusted enter in workflows
Immediate injection in CI brokers is actual and underappreciated. Right here’s the sample: an agent workflow reads content material from a pull request physique, a problem, or a commit message. That content material will get interpolated right into a immediate. The immediate goes to a mannequin. The mannequin output will get piped to a shell command. The entire thing runs with GITHUB_TOKEN permissions.
While you’re reviewing any workflow that calls an LLM, these are blockers:
- Is untrusted person enter, pull requestbodies, situation our bodies, commit messages, being interpolated into prompts with out sanitization?
- Is
GITHUB_TOKENwrite-scoped when it solely wants learn entry? - Is mannequin output being executed as shell instructions with out validation?
- Are secrets and techniques accessible to the agent step or being printed to logs?
What to require earlier than merge: least-privilege permissions within the workflow YAML (permissions: read-all is an affordable default), sanitize and quote untrusted content material earlier than it touches a immediate, separate the “evaluation” step from the “execution” step with a human approval gate for something touching manufacturing, by no means eval mannequin output.
| Time | Step | What to do |
|---|---|---|
| 1–2 min | Scan and classify | Have a look at the file record and diff measurement. Slim activity (docs, CI, small change) or advanced (multi-file, logic, efficiency, checks)? That classification units your overview depth for the whole lot that follows. |
| 2–3 min | Examine CI adjustments first | Earlier than studying a single line of app code, have a look at something touching .github/workflows, take a look at configs, protection settings, or construct scripts. Flag something that weakens CI. Cease signal examine. |
| 3–5 min | Scan for brand new utilities | Seek for new capabilities, helpers, or modules. For every one, do a fast repo search to examine for duplicates. Flag something that reinvents present performance. |
| 5–8 min | Hint one vital path | Choose a very powerful logic change. Hint it end-to-end: enter → transforms → output. Examine boundary situations, permissions, sudden branching. That is the step you can’t skip. |
| 8–9 min | Safety boundaries | If this PULL REQUEST touches any workflow that calls an LLM or handles untrusted enter, run by the safety guidelines above. |
| 9–10 min | Require proof | For any non-trivial logic change, require a take a look at that fails on the pre-change conduct. No rollback plan for dangerous adjustments? Ask for one. |
When to request a smaller pull request:
- The diff touches greater than 5 unrelated recordsdata
- You possibly can’t describe the aim of the pull request in a single sentence
- The agent has no implementation plan or the pull request physique is empty
- CI is failing and the one adjustments within the diff are to check recordsdata
Let Copilot overview it first
Use automated overview for what it’s good at: catching the mechanical stuff earlier than a human has to. Copilot code overview flags fashion inconsistencies, apparent logic errors, lacking error dealing with, and kind mismatches. It handles the low-level scan. That frees you up for the judgment work, which is the place your time really issues.
Deal with it as a prerequisite, not a alternative. Let Copilot run first. If it catches one thing apparent, let the creator handle it earlier than you make investments your overview time.
You possibly can tune this with customized directions particular to your workforce: flag something that modifies CI thresholds, floor new utilities for deduplication overview, examine that each exterior enter is validated. The extra particular your directions, the extra helpful the automated go.
💡 Professional tip: I not too long ago experimented with codifying my very own overview guidelines utilizing the Copilot SDK. As an alternative of remembering to run the identical safety checks on each pull request, I constructed a workflow that takes my private guidelines—auth on admin endpoints, checks really operating, secure env variable dealing with—and runs it in opposition to the diff mechanically. If it finds vital points, it blocks the merge.
Judgment is the bottleneck, and that’s nice
The floor space of code is rising. pull request quantity is rising. The time you spend scanning boilerplate ought to shrink.
What doesn’t shrink is the context you carry. The issues you understand about your system that aren’t written down wherever. That’s what makes your overview worthwhile, and it’s the half that doesn’t get automated.
Three takeaways:
- Any CI weakening is a tough cease.
- Let the brokers scan first. You hint the vital path.
- Crimson flag guidelines as your default on advanced agent pull requests.








