The CLAIMS Methodology
A structured framework for triaging research papers — six dimensions, one mandatory mechanism-mapping analogy, and a brown-bag voice.
CLAIMS v1.0 — last updated May 2026
A sibling to GRIN
VibesWire applies a structured analytical framework called CLAIMS to academic papers and technical research releases. CLAIMS is a sibling to GRIN — both are versioned, AI-applied analytical frameworks operated under the same Methodology Log — but they do different work. GRIN scores news stories on extraction and value flow. CLAIMS teaches readers to triage research papers.
This page documents the CLAIMS framework, the editorial commitments it encodes, the reader it serves, and the intellectual traditions it draws from. As with GRIN, the goal is total transparency. A reader who disagrees with our analysis of a paper should be able to point at the framework, identify the specific dimension or judgment they take issue with, and make a substantive argument about it.
Why CLAIMS exists
Science journalism is mostly broken in two directions. The press-release direction — university PR offices, EurekAlert, most general-interest outlets — translates papers into hype, treating every preprint as a breakthrough and every claim as if it were validated at scale. The opposite direction — academic critique, replication-crisis discourse, deep statistical takedowns — is rigorous but illegible to anyone outside the field. There is a thin middle layer of writing that does the actual job: contextualizing a paper accurately, telling readers where it sits against prior work, naming what was tested and what was assumed, and flagging when a “novel approach” is the seventh paper in a saturated subfield.
That middle layer exists in places — Quanta Magazine, Nature News & Views, a handful of individual writers — but it is rare and rarely systematic. CLAIMS is built to do that work consistently, at scale, applied to papers in any technical field a reasonably technical reader might want to triage.
The framing question CLAIMS asks about every paper is: what is the actual category of result here, and how should a senior engineer reading outside their subfield treat it? That is a different question from “is this paper interesting” or “should we cover this.” It is a question about epistemic placement.
The reader
CLAIMS is written for a specific reader: a senior engineer or technical leader reading a paper outside their primary subfield. This reader knows algorithms, ML basics, statistics, and standard research methodology. They can handle math if it is set up properly. They have a working bullshit detector and they are not flattered by hype.
This reader does not need a paper summarized. They can read the paper. What they need is help triaging — should they keep reading, bookmark it for later, ignore it, or worry that their stack is about to be obsoleted? CLAIMS produces the briefing that answers that question, with enough context that they can extend the analysis to the next paper they read in the same area on their own.
This reader profile shapes everything else in the framework. The voice is brown-bag, not press-release. The numbers are reported brutally. The contextualization assumes the reader will challenge any unsupported claim. Condescension (“simply put...”, “in layman's terms...”) is banned. Hype is banned. The reader is treated as a peer, not an audience.
The six dimensions
Every CLAIMS briefing covers six dimensions, in order. Each dimension is required. Skipping any of them indicates that the analysis is incomplete or that the paper does not actually have a defensible answer at that dimension — which is itself useful information for the reader.
The single ‘never been done before’ sentence, qualifiers stripped.
The Claim is not what the paper says it is. Papers often open with “we propose a novel approach to X” — that is throat-clearing, not a claim. The Claim is what the paper has actually committed to demonstrating; what would be falsified if the result did not hold. The novelty question is binary: either this is a genuine first or it is the Nth iteration of an existing template, and if it is the latter, the rest of the briefing is about what makes this iteration interesting (or doesn't).
Where this sits against the strongest classical or prior-art baseline.
The most quantitative dimension. Headline numbers — accuracy, latency, compression ratio, sample efficiency — placed against the strongest comparable prior result, with names and numbers. Editorial honesty about benchmark choice happens here. If the paper compares to a stale baseline, CLAIMS notes it. If classical methods still win at the demonstrated scale, CLAIMS says so plainly. The Ladder is not graded on slope; ties at 1/10th the data are on a different ladder than 5%-better wins. CLAIMS makes both ladders explicit so the reader can place the paper.
Algorithmic family, structural choices, and the compute or hardware property the method leans on.
Architecture is what makes papers commensurable across the field. A reader who understands “this is a JEPA-family model with a discrete autoregressive decoder” can place the paper among other JEPA work and extend the analysis to similar papers. CLAIMS reports architecture at the level a peer would discuss it: family, key structural choices, compute properties, hardware assumptions. Not enough detail to reimplement — enough that the paper is now mentally indexed alongside its cousins.
Who grades the homework, how circular the validation is, and whether benchmarks were chosen before or after the results.
Where p-hacking, cherry-picking, and selection bias live. CLAIMS reports the evaluation regime honestly. A paper validated on the authors' own probe diagnostic is doing different work than a paper validated on a community benchmark with public test sets. This dimension also notes scope limits the paper itself may not emphasize: if the experiments were run only on 3-second clean-speech clips, the paper has not demonstrated anything about long noisy audio — even if the framing implies broader applicability.
The next concrete number that would unlock the next real thing — and how far away it is.
Converts the paper's progress into a number a reader can track over the next few years. Not “more research needed” — that is filler. Specifically: “33 qubits today, 150 qubits unlocks 50 amino acids.” “TIMIT 3-second clips today, LibriSpeech-960h with downstream ASR is the next unlock, roughly 1–2 years out if the approach scales.” The Milestone makes papers tractable as bets, and briefings useful as bookmarks rather than one-time reading.
The obvious next experiment the authors did not run, and an honest read on why.
Three honest possibilities for any missing experiment: (a) they ran out of compute or budget, (b) they ran it and it did not work and quietly did not report, (c) they are saving it for the next paper. CLAIMS picks one and explains the read. This is the dimension most likely to be uncomfortable for paper authors — saying “we suspect (b) here” means publicly speculating that authors hid a negative result. CLAIMS does this when warranted because the alternative is letting unsupported framings circulate uncontested.
The opening move — the Analogy
Every CLAIMS briefing opens with a vivid analogy from something the reader already knows from daily life or another field. This is structural, not decorative.
Mechanism-mapping
“If you've used a spell-checker, you already understand edit distance — and that's the property this paper wants for audio tokens.”
Topic-mapping (banned)
“Imagine teaching a computer to listen like a human.”
The mechanism-mapping requirement is the most demanding part of the framework. It forces the writer (or AI applying the framework) to actually understand what the paper is doing at the mechanism level. If you cannot find a mechanism-mapping analogy from everyday life or an adjacent field, the analysis is not done — the writer does not understand the mechanism well enough to explain it. Bad analogies are worse than no analogy because they map the reader's intuition to the wrong structure.
The Analogy is the hook. It makes the reader engage with the mechanism before they know they are reading about science. By the time they hit the Claim, they are already thinking about the mechanism in the right shape.
This commitment is borrowed from the best science journalism (Quanta Magazine is the contemporary exemplar) and from a long tradition of mechanism-mapping pedagogy — Feynman's diagrams, Schelling's segregation models, Lakoff's work on metaphor as cognition. CLAIMS is not original in valuing analogy; CLAIMS is unusual in requiring it as a structural element of every briefing.
Brown bag voice
The CLAIMS voice is informal technical conversation between peers — the kind of conversation that happens in a brown-bag lunch when a senior researcher walks the team through a paper they read over the weekend.
The voice uses “you” and “we”.The reader is a peer, not an audience. The writer is a colleague, not an authority.
Numbers are reported brutally.“Success rates of 20–40% require many shots.” “Classical baseline still wins by 3x at this scale.” “The 55% token reduction is real, but the benchmark is small and clean.” Hedging numbers in soft language is editorial cowardice.
The voice contextualizes against the field’s actual fight.Most subfields have a live argument going — what question is being contested, what evidence would settle it, who is on each side. CLAIMS places the paper in that argument. Without that contextualization, the briefing is a description of one paper rather than a placement of it.
Condescension is banned.“Simply put...”, “in layman's terms...”, “as you may know...” — all banned. The reader does not need to be reassured that they can handle the material. They are a senior engineer. Treat them like one.
Hype is banned.No “revolutionary,” no “paradigm-shifting,” no “groundbreaking.” If the paper is genuinely a paradigm shift, CLAIMS explains what was believed impossible and why. If it is a small step, CLAIMS says so plainly. The vocabulary of hype is unavailable to CLAIMS analyses by editorial commitment.
The reader walks away with a lens, not facts.The job of a CLAIMS briefing is not to deliver a summary of one paper. It is to teach the reader a way of thinkingabout papers in this area — so that the next paper they read in the same subfield is easier to triage on their own. If the reader has only learned facts about one paper, the briefing failed at its primary job.
What CLAIMS scores — and what it doesn’t
CLAIMS produces a VibesScore in the 0–100 range, but unlike GRIN's mechanical formula composition, the CLAIMS VibesScore is a holistic editorial judgment. Scientific significance does not compose mechanically. A paper can be genuinely novel but poorly validated, or rigorously tested but trivial, or saturating an existing template with the best execution to date. None of these reduce to a weighted sum.
The CLAIMS VibesScore reflects an overall judgment informed by:
- Novelty against prior art (Claim and Ladder)
- Validation strength (Integrity)
- Field significance — whether this changes how practitioners think or work
- Honesty of presentation — does the paper overclaim or stay honest about scope
- Reachability of the next milestone — is this paper a step toward something concrete or an isolated curiosity
CLAIMS does not score:
- The paper's correctness in detail.CLAIMS evaluates the structure of the paper's argument and validation regime. It does not perform peer review on the math, the code, or the proofs.
- The authors' research program in general. Each paper is scored on what it has demonstrated, not on the authors' reputation or their broader contribution to the field.
- The press release or institutional framing. CLAIMS evaluates the paper, not the university PR office's interpretation of it.
- The paper's ideological alignment. Papers from any institution or country are scored on the same criteria.
- Predicted impact.CLAIMS asks “what is the next milestone and how far away is it” — but it does not predict whether the milestone will be hit.
Editorial commitments encoded in CLAIMS
Like GRIN, CLAIMS encodes specific editorial commitments. Stating them explicitly is the price of asking readers to take the analysis seriously.
- The paper's framing is not authoritative. CLAIMS is willing to disagree with the paper's own characterization of its novelty, scope, or significance. “We propose a novel approach to X” does not make the paper novel; the analysis of the Claim and Ladder dimensions does.
- Validation regime matters more than result magnitude. A 5% improvement validated against the strongest baseline on a community benchmark is more meaningful than a 50% improvement validated against a stale baseline on the authors' own probe. CLAIMS weights Integrity heavily.
- Specific scope must not be generalized. A paper that demonstrates an effect on 3-second clean-speech clips has not demonstrated anything about long-form noisy audio. CLAIMS refuses to extend results beyond the scope the paper actually tested, even when the framing implies broader applicability.
- Anti-hype is a structural commitment, not a stylistic preference. The vocabulary of hype is unavailable to CLAIMS by editorial design. “Revolutionary,” “groundbreaking,” “paradigm-shifting” are banned not because they are tacky but because they smuggle conclusions into the framing.
- Speculation about why an experiment was not run is allowed when warranted. The Successor dimension is willing to speculate publicly that authors ran an experiment and quietly did not report negative results. This is uncomfortable but necessary — the alternative is letting unsupported framings circulate uncontested.
- The reader's time is valuable. Every CLAIMS briefing is structured so a reader can extract triage value within 90 seconds of reading. The longer-form analysis exists for readers who want to go deeper, but the headline judgments are upfront.
The role of AI
CLAIMS, like GRIN, is applied by an LLM (currently Claude). The same locus-of-bias argument applies: the framework is authored by VibesWire's editorial team and encoded in a system prompt; the framework is then applied consistently across papers without per-paper human bias injection.
CLAIMS has one additional consideration that GRIN does not. The Analogy requirement is the most demanding generative move in either framework — requiring the model to find a mechanism-mapping analogy from everyday life that captures a paper's core mechanism. This is the dimension where the AI is most likely to fail in interesting ways, and it is the dimension we audit most closely. A bad analogy in a CLAIMS briefing is a more serious editorial failure than a bad scoring decision in GRIN, because it actively miseducates the reader. The Methodology Log tracks Analogy quality as a separate audit metric.
The other CLAIMS-specific consideration: research papers contain technical claims that the AI may or may not correctly understand. CLAIMS analyses are reviewed for technical correctness before publication on papers in fields where misreading the architecture or method would mislead readers. This human review does not change the score or the framing — it checks the technical content for errors. Reviewer interventions are noted in the Methodology Log when patterns emerge.
The Methodology Log (shared with GRIN)
CLAIMS shares VibesWire's Methodology Log with GRIN. Both frameworks are versioned together, with quarterly entries covering changes to either or both. Major version increments are independent — CLAIMS v1.x and GRIN v2.x can coexist — but the publication cadence and the audit mechanisms are unified.
Each Methodology Log entry covers, where applicable to CLAIMS:
- Changes to the six-dimension framework (additions, removals, redefinitions)
- Changes to the Analogy requirement or assessment criteria
- Changes to the Brown Bag voice rules
- Calibration deltas if scoring rubric changes
- Patterns in reader feedback or reviewer interventions that motivated changes
The structural commitments CLAIMS will not change without strong evidence and major version increment:
- The six-dimension structure (C, L, A, I, M, S)
- The mandatory Analogy opening
- The mechanism-mapping requirement for analogies
- The Brown Bag voice (including the bans on condescension and hype)
- The reader profile (senior engineer outside their subfield)
Feedback and adversarial review
CLAIMS uses the same three-mechanism feedback structure as GRIN:
Reader submissions.Form for “things CLAIMS got wrong” — technical errors, missed prior art, misread methodology, weak analogies. Patterns are summarized in each quarter's Methodology Log.
Commissioned adversarial review.Annually, we hire a critic with explicit hostility to the framework — typically a working researcher in a field we cover — to audit a sample of briefings and identify systematic gaps. The critic's report is published alongside our response.
Cross-publication comparison.Quarterly, we compare a sample of CLAIMS briefings to coverage of the same papers in Quanta, Nature News & Views, the original press release, and the authors' own framings. Divergences are documented. This is not because we treat any of those sources as authoritative — they are not — but because the divergences make our editorial choices visible by contrast.
Glossary
Analogy.The mandatory opening of every CLAIMS briefing — a vivid mechanism-mapping comparison from everyday life or an adjacent field. Banned: topic-mapping analogies.
Architecture.The algorithmic family and structural choices that make a paper commensurable with related work.
Brown Bag Voice.The editorial register CLAIMS uses — informal technical conversation between peers, with brutal honesty about numbers and explicit field-fight contextualization.
Claim.The single committed statement of what the paper has actually demonstrated, qualifiers stripped. Distinct from the paper's framing of its own novelty.
Field Fight.The live argument in the subfield that the paper participates in. CLAIMS places every paper in its field fight.
Integrity.The validation regime — who is grading the homework, how circular the validation is, whether benchmarks were chosen before or after results were known.
Ladder.The placement of the paper against the strongest comparable prior result, with named baselines and reported numbers.
Lens.The generalizable way of thinking about papers in this area that CLAIMS aims to leave with the reader. The lens is the deliverable; the briefing is the vehicle.
Milestone.The next concrete number that would unlock the next real thing — convertible into a tracker over 2–5 years.
Successor.The obvious next experiment the authors did not run, with an honest read on why.
Intellectual genealogy and related reading
CLAIMS draws on three traditions: science journalism that does the actual job, technical critique that maintains epistemic standards, and the broader philosophy-of-science literature on what makes claims real.
Science journalism that does the work
Quanta Magazine.The contemporary anchor. Natalie Wolchover, Charlie Wood, Rodrigo Pérez Ortega, and the rest of the staff produce science writing that maps mechanisms, contextualizes against field debates, and trusts the reader. CLAIMS's voice and structure owe Quanta a direct debt. If you want to see what CLAIMS aspires to in execution, read Quanta on a paper in a field you don't work in and notice how much you understand by the end.
Nature News & Views and Science Perspectives.The format invented for this work — a working scientist contextualizing a paper for non-specialists in the same broad field. Closer to peer review than to journalism. CLAIMS's Ladder dimension imitates the News & Views move of placing a paper against named prior work with explicit numbers.
Carl Zimmer (NYT, She Has Her Mother’s Laugh).Biology coverage that mechanism-maps consistently. Long-form, but every paragraph is doing structural work.
Ed Yong (formerly The Atlantic).Pandemic and biology reporting that stays honest about uncertainty, scope, and the difference between what was tested and what is being claimed.
Anil Ananthaswamy.Physics writing with deep mechanism-mapping. Through Two Doors at Once is the model for explaining a structural fight in a field through specific experiments.
Sabine Hossenfelder.Physics with brutal honesty about hype. Her work is the contemporary exemplar of the anti-hype commitment CLAIMS encodes structurally.
Technical critique that maintains standards
Gary Marcus.Public auditing of AI claims, often in real time. The willingness to say “this paper does not show what its press release says it shows” is the move CLAIMS automates.
Andrew Gelman, Statistical Modeling, Causal Inference, and Social Science.The long-running blog on statistical hygiene, p-hacking, and the difference between papers that survive scrutiny and papers that do not. CLAIMS's Integrity dimension is downstream of Gelman's framing.
Ben Recht, ArgMin Gravitas.ML paper criticism with mathematical rigor. The model for how to disagree with a paper's claims without becoming a crank.
Arvind Narayanan & Sayash Kapoor, AI Snake Oil.Systematic auditing of overclaiming in AI. The book and the associated newsletter are CLAIMS-adjacent in spirit — both exist because most AI coverage is broken in a specific way.
Scott Alexander, Astral Codex Ten.Meta-analytical posture toward published claims. The willingness to engage seriously with a paper while still concluding “this probably does not replicate” is the editorial posture CLAIMS encodes.
Gwern Branwen, gwern.net.Long-form deep dives that take papers seriously while maintaining ruthless calibration. The rigor of citation and the willingness to track predictions over years are aspirations.
Philosophy of science — what makes claims real
Karl Popper, The Logic of Scientific Discovery.The falsifiability framing. CLAIMS's Successor dimension — what experiment was not run, and what would falsify the paper's claim — is downstream of Popper directly.
Imre Lakatos, The Methodology of Scientific Research Programmes.The framing of research traditions as competing programs with progressive and degenerative phases. CLAIMS's Ladder dimension implicitly asks where a paper sits in its program — extending it productively, defending a degenerative core, or starting a new one.
Thomas Kuhn, The Structure of Scientific Revolutions.The distinction between normal science and paradigm shifts. CLAIMS uses this distinction explicitly when evaluating Claim — most papers are normal science (Nth iteration of a template); a small minority propose new templates.
John Ioannidis, “Why Most Published Research Findings Are False” (2005).The replication-crisis foundational paper. CLAIMS's Integrity skepticism — assuming validation is weak until shown otherwise — is a downstream commitment.
Brian Nosek and the Center for Open Science.The institutional response to the replication crisis. Pre-registration, open data, registered reports. CLAIMS implicitly favors papers that engage with these practices and notes when they do not.
Methodological inspiration — analogy as cognition
Richard Feynman, Surely You’re Joking, Mr. Feynman and the lectures.The patron saint of mechanism-mapping pedagogy. The commitment that “if you can't explain it to a freshman, you don't understand it” is the spiritual ancestor of CLAIMS's Analogy requirement.
George Lakoff, Metaphors We Live By (with Mark Johnson).The cognitive-linguistics argument that metaphor is not decorative but constitutive of thought. CLAIMS's commitment to mechanism-mapping analogies as structural is downstream of Lakoff.
Thomas Schelling, Micromotives and Macrobehavior.The mid-20th-century master of using simple analogies (the segregation checkerboard, the lighting-the-dark-room thought experiment) to illuminate complex social mechanisms. The model for analogy that does load-bearing intellectual work.
Daniel Kahneman, Thinking, Fast and Slow.Mechanism-mapping at scale across cognitive science. The book is a 500-page demonstration of how to explain technical findings through analogies that map to real phenomena.
CLAIMS is maintained by VibesWire and operated jointly with the GRIN framework.
CLAIMS exists because most science journalism is broken in specific ways.