How it works

Scrutinise is AI built to synthesise scientific evidence and track when the science has moved.

No hallucinations and full disclosure of scientific records. Scrutinise is the AI for those who can't afford to miss when the evidence on their asset changes.

4 Agent Pipeline

01

Retriever

Pulls the full evidence corpus using a structured query cascade across four databases — including papers that contradict the claim. Every retrieved DOI is logged before the next agent runs.

02

Analyst

Reads each retrieved paper against a structured rubric: study design, sample size, effect size, follow-up duration, and comparator fidelity. Produces an initial evidence summary without rendering a verdict.

03

Archie — Sceptic

The adversarial agent. Actively tries to falsify the claim — probing for confounding, publication bias, methodological limits, and class contamination. Archie's objections must be explicitly overruled to advance the score.

04

Moderator

Weighs the Analyst's evidence summary against Archie's objections and renders the final calibrated verdict — with written rationale, DOI chain, and evidence-tier breakdown. The output is auditable at every step.

Literature retrieval

Four databases, one cascade. Zero hallucinated citations.

The Retriever runs a structured query cascade in sequence. A paper must be resolved to a real DOI in the corpus to be cited. Free-text generation of references is architecturally impossible.

01

Consensus

AI semantic search across 200M+ papers. Full-text access for a growing corpus via publisher partnerships with Elsevier, Springer Nature, Wiley, and others. First tier because it retrieves the most semantically relevant papers for the claim.

02

PubMed

NLM / NCBI indexed literature. Structured MeSH term queries ensure comprehensive coverage of the clinical trial and systematic review literature. Abstract and metadata retrieval.

03

Semantic Scholar

Allen Institute for AI scholarly graph — 200M+ paper index with citation network data. Identifies highly-cited papers and recent work that may not yet be indexed in PubMed.

04

OpenAlex

Open scholarly graph covering 250M+ works including conference proceedings, preprints, and grey literature not captured in clinical databases. Ensures no emerging evidence is missed before it reaches PubMed.

Evidence verdict

Five tiers. Each one maps to a defensible threshold.

The Moderator assigns one of five verdicts based on the weight and quality of the retrieved evidence. Tier boundaries are fixed, consistent across all claims and indications, and calibrated against Cochrane systematic reviews using the GRADE certainty framework.

Definitive

≥ 90%

Multiple independent high-quality RCTs. Consistent effect direction. No significant methodological concerns raised by Archie.

Likely

70 – 89%

Strong evidence base with minor gaps or indirect comparisons. Archie's objections addressed but not fully resolved.

Uncertain

40 – 69%

Mixed evidence or methodological concerns that could change the verdict with further data. Active monitoring flag applied.

Doubtful

20 – 39%

Evidence leans against the claim or is too limited to support it. Archie's objections could not be overruled.

Refuted

< 20%

Consistent evidence against the claim. Direct contradicting RCTs or systematic reviews. Claim flagged for immediate attention.

Monitoring cadence

Three types of run. Each with a distinct job.

Scrutinise monitors your pipeline on three overlapping schedules — one that builds the foundation, one that maintains the weekly rhythm, and one that acts as a real-time tripwire for high-priority publications that materially move the underlying evidence.

01

Baseline

Once · at programme setup

A comprehensive one-time review of the full published literature relevant to each monitored claim, with no date floor. Builds the complete evidence picture and produces the starting score. Every subsequent run is measured against it.

02

Weekly cycle

Every Sunday

Checks for new papers published since the previous cycle and re-scores any claim where the evidence has changed. Claims with no new evidence carry their score forward unchanged. The cycle closes with the weekly digest delivered to your team.

03

Daily watch

Every day · between cycles

A lightweight daily scan that detects high-priority publications — Phase 3 readouts, significant new trial results, or papers that directly challenge the current evidence. On detection, Scrutinise re-scores immediately and fires a material mover alert. Most days, nothing fires. That is the intended outcome.

Auditability

Every verdict is fully traceable.

Each scoring run produces a complete audit record. Nothing in the output is unverifiable.

Retrieval log

Every query issued, every DOI returned, every paper excluded with reason — timestamped at run time.

Evidence tier per paper

Each paper is assigned a tier: RCT, systematic review, NMA, observational, case series. Tier weighting is fixed and documented.

Archie's objections

The sceptic's challenges are written out in full — confounders identified, biases flagged, methodological limits noted.

Moderator rationale

A written explanation of why Archie's objections were accepted or overruled, with the final DOI chain supporting the verdict.

Movement record

When a verdict changes, the specific paper that drove the change is recorded alongside the previous and current score.

Standard AI vs Scrutinise

Commercial LLMs are not good enough to stress-test the evidence.

General-purpose LLMs like OpenAI or Claude are built to provide fluent, engaging answers, not to scrutinise the scientific evidence for those who need to know.

− General-purpose AI

+ Scrutinise

Hallucination — fabricates citations, invents trial names, conflates study designs.

Every verdict traces to a real DOI. Papers contradicting the claim are retrieved and scored alongside supporting ones.

Overconfidence — no distinction between three independent RCTs and one underpowered observational study.

Every claim scored on a five-tier verdict scale. Indirect comparisons, methodological limits, and split evidence flagged explicitly.

No structured critique — no adversarial step to challenge a claim, surface methodological weaknesses, or stress-test the conclusion. Essential for defensible scientific review.

Archie the Sceptic is structurally built into every score. The adversarial agent actively tries to falsify each claim before the verdict is rendered, limiting overconfidence errors at the source.

No audit trail — no DOI chain, no retrieval log, no record of what was reviewed. Undefendable in a payer or HTA challenge.

Full corpus log per score: papers retrieved, scored, excluded with reason, and Archie's sceptic rationale — timestamped.

Stale training data — cutoff months or years before you ask the question.

Corpus retrieved live every cycle. The score reflects the literature as it stands this week.

Common questions

How does it work in practice?

How did you test for accuracy?

We validated Scrutinise against Cochrane systematic reviews — the most rigorous independently produced evidence summaries in medicine. Across 200 tested claims drawn from published Cochrane reviews, Scrutinise verdicts matched Cochrane conclusions in 94% of cases. On the remaining 6%, disagreements were traceable to evidence published after the Cochrane review's search cutoff, not to model error. Because Scrutinise updates weekly against the live literature, it can capture evidence that a static systematic review cannot.

How do you claim zero citation hallucinations?

Every verdict in Scrutinise is anchored to a specific DOI retrieved from the corpus during that scoring run. The scoring agents cannot cite a paper that was not returned by the retrieval step — there is no free-text generation of references. Each output record carries a full audit trail: the query used, the DOIs retrieved, the evidence tier each paper was assigned, and the reasoning chain that produced the verdict. If a paper cannot be resolved to a DOI in the corpus, it is excluded. This makes citation hallucination architecturally impossible rather than just unlikely.

Is retrieval full-text?

Discovery is full-text. Synthesis is abstract-only.

Through Scrutinise's partnership with Consensus, the first tier of our retrieval cascade has full-text licensed access to papers from Elsevier, Springer Nature, Wiley, and other major publishers. This lets us identify the papers that genuinely engage a claim, including cases where the relevant evidence sits in the methods or results section rather than the abstract.

The evidence synthesis itself is drawn only from the open abstract and structured metadata of each retrieved paper. Every citation, every anchoring quote, and every passage that informs a verdict comes from material that is openly published. Scrutinise does not reproduce or surface full-text content at any stage.

DOI links in the output point to the publisher's site, so users access papers through their own institutional subscriptions. The architecture lets us search deeper than any abstract-only system while remaining fully compliant with scientific publishing licences.