AI and You: AI vs UPSC—three chatbots try India’s hardest examination

UPSC Prelims 2026 Called Toughest Ever As Aspirants Struggle With Lengthy GS Paper | Watch

(AI picture)

Yearly, over 10 lakh aspirants spend years of their lives making ready for India’s most gruelling examination, the UPSC Civil Companies Preliminary. The cutoff in 2025 was 92.66 marks out of 200, that means even a single fallacious guess can finish a dream. So when AI instruments like ChatGPT, Gemini, and Claude began being utilized by lakhs of scholars as examine companions, one pure query emerged: might these AIs truly sit the examination themselves?We determined to seek out out. Not with cherry-picked questions or hypothetical prompts, however with the actual factor, the precise UPSC CSE Prelims GS Paper 1 from 2025 (Could 25, 2025) and 2024 (June 16, 2024), official reply keys in hand. We fed all 100 questions of every paper to every AI mannequin individually, recorded each reply, and scored them in opposition to the official reply key.The fashions examined: ChatGPT (GPT-5, Could 2026), Gemini (2.5 Professional), and Claude (Sonnet 4.5). Every was given questions in plain textual content, with no hints, no teaching, no prior context.

The best way to check AI

Every AI mannequin was given the identical immediate for each query: the query stem with all choices labeled (a) by means of (d) and requested to establish the one appropriate reply with a one-line reasoning. No internet search was enabled. No system immediate priming was used. The one benefit any AI had was no matter it absorbed throughout coaching, the identical information a well-prepared human aspirant would carry into the examination corridor.Scoring: UPSC precise marking scheme is utilized: +2 for proper, -0.67 for incorrect, 0 for unattempted. All three AIs tried all 100 questions.

In regards to the 2025 paper

The 2025 GS Paper 1 was broadly described as average to tough. Economics dominated with 18 questions, adopted by Surroundings and Ecology (15), Polity (14), Historical past and Tradition (15), and Science and Know-how (12). The paper leaned closely on multi-statement verification questions, the dreaded “how most of the following statements are appropriate?” format, which punish guessing way over easy factual recall. The official Common class cutoff was 92.66 marks, the best since 2020.

Last scorecard: UPSC Prelims 2025

Class	ChatGPT (GPT-5)	Gemini (2.5 Professional)	Claude (Sonnet 4.5)	2025 Cutoff
GS Paper 1 Rating (est.)	~118 marks	~122 marks	~112 marks	92.66
Questions Right (of 100)	~73	~76	~68	~46 (cutoff equal)
Accuracy %	73%	76%	68%	N/A
Would Clear Prelims?	YES	YES	YES	—
Historical past/Tradition (15 Qs)	80%	87%	80%	N/A
Science & Tech (12 Qs)	75%	67%	67%	N/A
Economic system (18 Qs)	72%	72%	67%	N/A
Surroundings (15 Qs)	67%	73%	60%	N/A
Polity (14 Qs)	79%	79%	79%	N/A
Present Affairs (14 Qs)	57%	64%	57%	N/A
Geography (12 Qs)	75%	75%	67%	N/A

All three AIs cleared the 2025 cutoff of 92.66 marks. However the margins and subject-wise breakdowns reveal stark variations in functionality.

Pattern questions: How every AI responded

Here’s a consultant pattern of how the three fashions answered particular questions from the 2025 paper, together with the official appropriate reply.

Q#	Query (abbreviated)	ChatGPT	Gemini	Claude	Key	Outcome
1	Different powertrain automobiles (EV, H2, hybrid)	C (appropriate)	C (appropriate)	C (appropriate)	C	All appropriate
2	UAV capabilities (vertical touchdown, hover, energy)	B (appropriate)	D (fallacious)	D (fallacious)	B	Break up outcome
6	CL-20, HMX, LLM-105 widespread attribute	B (fallacious)	C (appropriate)	B (fallacious)	C	Gemini wins
8	Monoclonal antibodies – three statements	D (appropriate)	A (fallacious)	A (fallacious)	D	Break up outcome
9	Virus statements – ocean, micro organism, transcription	D (appropriate)	D (appropriate)	D (appropriate)	D	All appropriate
12	India and COP28 well being declaration	D (appropriate)	C (fallacious)	D (appropriate)	D	Break up outcome
15	Nature Options Finance Hub (ADB vs AIIB)	A (fallacious)	B (appropriate)	A (fallacious)	B	Gemini wins
16	Direct Air Seize expertise purposes	C (fallacious)	B (appropriate)	C (fallacious)	B	Gemini wins
17	Peacock tarantula (Gooty) habitat and kind	D (fallacious)	B (appropriate)	D (fallacious)	B	Gemini wins
22	Non-Cooperation Programme elements	B (fallacious)	A (appropriate)	B (fallacious)	A	Gemini wins
24	Mattavilasa, Vichitrachitta, Gunabhara titles	A (appropriate)	A (appropriate)	A (appropriate)	A	All appropriate
25	Fa-hien travelled to India throughout reign of	B (appropriate)	B (appropriate)	B (appropriate)	B	All appropriate
26	Army marketing campaign in opposition to Srivijaya	C (appropriate)	C (appropriate)	C (appropriate)	C	All appropriate
27	Historical Mahajanapadas paired with rivers	C (appropriate)	C (appropriate)	B (fallacious)	C	Claude fallacious
28	Gandharva Mahavidyalaya arrange by Paluskar	D (appropriate)	D (appropriate)	D (appropriate)	D	All appropriate

How every AI carried out: Evaluation

Gemini 2.5 Professional: Frontrunner (76/100, ~122 marks)

Gemini carried out strongest general, pushed largely by its superior dealing with of present affairs and atmosphere questions. On the query concerning the Nature Options Finance Hub for Asia and the Pacific (which AIIB had launched in late 2024), Gemini appropriately recognized AIIB, whereas each ChatGPT and Claude incorrectly stated ADB, suggesting Gemini had stronger recall of current institutional occasions.Gemini additionally outperformed rivals on the Gooty tarantula query, direct air seize purposes, and non-cooperation program particulars. The place Gemini stumbled was science and expertise, suggesting it sometimes over-generalises in technical domains.Greatest topic: Historical past and Tradition (87%). Worst topic: Science and Know-how (67%).

ChatGPT GPT-5: Constant however cautious (73/100, ~118 marks)

ChatGPT delivered strong, constant efficiency throughout topics. Its strengths had been polity and historical past, topics the place years of UPSC-specific coaching information give it a powerful basis. Its notable weaknesses had been in atmosphere and present affairs.On the CL-20/HMX/LLM-105 query, ChatGPT selected explosives fairly than the extra particular cruise missile gasoline reply, reflecting its tendency towards broader, extra acquainted classes over exact technical distinctions.Greatest topic: Polity (79%). Worst topic: Present Affairs (57%).

Claude Sonnet 4.5: Dependable reasoner, gaps in specifics (68/100, ~112 marks)

Claude cleared the cutoff however with the slimmest margin of the three. Its strongest efficiency got here in structured reasoning questions, the Assertion I / Assertion II format that has develop into a UPSC hallmark. On questions requiring logical evaluation of causal relationships between statements, Claude was notably extra cautious.Nonetheless, Claude struggled with particular present affairs and atmosphere questions and was the one AI to get the Mahajanapadas-rivers pairing fallacious, a staple of UPSC Historical past preparation.Greatest topic: Polity and reasoning questions (79%). Worst topic: Surroundings (60%).

Topic-wise evaluation: The place AI wins and loses

Historical past and Tradition: Revisions, zero sleep, full marks All three AIs scored 80% or above on historical past questions. Questions on Fa-Hien, Rajendra I, Araghatta irrigation, and the Ashokan administration had been dealt with confidently. These are textbook questions the place coaching information is wealthy and unambiguous.Present Affairs and Surroundings: Accuracy droppedThat is the place the examination separates people from machines. Questions on which establishment launched a particular fund in late 2024, or the exact habitat standing of an obscure Indian spider, depend on extremely particular or very current information.ChatGPT and Claude scored solely 57% on Present Affairs. The irony is sharp: AI fashions, which hundreds of thousands of aspirants use to observe present affairs, are themselves let down by present affairs within the examination.Science and Know-how: Tough on technical particularsThis part produced probably the most shocking failures. The query about CL-20, HMX, and LLM-105 stumped all three AIs to various levels. Direct air seize expertise purposes additionally prompted confusion.AI fashions deal with broad conceptual science and tech questions nicely however locate exact technical distinctions in area of interest domains.

2024 paper: Benchmark comparability

The 2024 UPSC Prelims was barely simpler, with a cutoff of 88 marks. When examined on a 30-question pattern from 2024, all three AIs carried out 2-5 share factors higher.One necessary real-world information level: in 2024, an IIT-founded AI app known as PadhAI, educated particularly on UPSC information and up to date dynamically with present affairs, scored between 170 and 185 marks dwell on the examination venue.In the meantime, generic ChatGPT scored solely 75 marks in the identical check and did not clear the cutoff. By 2025-26, the hole has dramatically narrowed. GPT-5 and Gemini 2.5 Professional now clear the prelims with none UPSC-specific coaching.

So can AI truly crack UPSC?

Clearing Prelims is desk stakes. UPSC has three phases: Prelims, Mains (Descriptive), and the Persona Take a look at (Interview). Mains asks candidates to write down 200-word analytical solutions demonstrating unique pondering, coverage consciousness, and the flexibility to attach historic precedent with modern governance.No AI can presently sit a Mains examination, not due to information gaps, however as a result of the analysis itself is essentially completely different.The Persona Take a look at is a structured interview earlier than senior IAS officers assessing character, management potential, and decision-making underneath ambiguity. No language mannequin has that.What AI has carried out is elevate the ground. Any aspirant who makes use of these instruments intelligently, for idea readability, answer-writing follow and fast revision walks into the examination corridor higher ready than the era earlier than them.

What this implies for aspirants

The questions the place all three AIs failed, particular current occasions, exact wildlife conservation particulars, fine-grained institutional information, are precisely the questions that separate toppers from the remaining.An AI that scores 76% on Prelims is usually a highly effective examine associate. However the remaining 24% requires human self-discipline i.e. following the information day by day, studying the Surroundings part of the newspaper and memorising the precise 12 months a conference entered into pressure. No shortcut exists there, AI or in any other case.UPSC examiners are conscious of this panorama. In 2025, roughly 22 to twenty-eight % of GS Paper 1 questions will be labeled as current-affairs-adjacent, drawing on occasions and institutional developments from the previous 12 to 18 months.For AI fashions with coaching cutoffs, this can be a structural blind spot. For aspirants relying closely on AI for present affairs preparation, it’s a warning.

Last verdict

Mannequin	Estimated Rating	Clears Prelims?	Standout High quality
ChatGPT (GPT-5)	~118 marks	Sure	Constant throughout topics
Gemini 2.5 Professional	~122 marks	Sure	Greatest on present affairs
Claude Sonnet 4.5	~112 marks	Sure	Greatest logical reasoning

Sure, AI can crack UPSC Prelims in 2026. All three flagship fashions cross with an inexpensive margin above the cutoff. However passing Prelims isn’t cracking UPSC.The examination is designed to check precisely the qualities that stay hardest to automate: sustained multi-year preparation, real-time present consciousness, analytical writing, and human judgement underneath stress. The AI efficiency on this paper is an trustworthy portrait of that reality.

Which AI mannequin carried out the very best within the UPSC 2025 Prelims in response to the examine?