`/review-engine` skill — contract¶

Project-local Claude Code skill at .claude/skills/review-engine/SKILL.md. Invoked with /review-engine on a branch with engine changes.

Why this exists¶

The OGUR intelligence engine (ogur/engine/) is the silent-failure surface of the product. A wrong prompt produces plausible output for the wrong indication. A schema mismatch only fails at the verifier seam. A patch-where-used violation passes type-checking and lints clean but breaks test isolation.

Standard code review (including /code-review and /code-review --effort ultra) reports findings on confidence. Most reviewers — human or LLM — over-report because the cost of false positives feels lower than the cost of missing a bug. In practice, false positives are the more expensive failure mode: they burn the author's attention on non-bugs and erode trust in future findings.

/review-engine inverts the bias. A finding is reportable only if it comes with a runnable artifact that fails. Suspicions without proof are downgraded to questions for the human reader. The output is dense with evidence, not broad with opinion.

The contract¶

Category	Required artifact	Example
A — Engine/data correctness	A failing `pytest` using the factories in `tests/conftest.py`	`make_signal(...)`, `make_drug_profile(...)`, `make_kiq(...)`, `make_landscape(...)`
B — LLM-stage contract	A model output that `validate_kiq_answers` or `validate_entity_references` rejects, or a Pydantic schema validation error	`ogur.engine.verification.validate_kiq_answers(answers, expected)`
C — Invariant violations	A DB state that violates `content_hash` uniqueness, `DrugSynonym` normalization, `DetectedChange` value-object, or `patch-where-used`	StaticPool in-memory SQLite via the `patch_db` fixture

If none of these can be produced, the finding goes in a "Questions for the human" section, phrased as a question. Reader can decide whether to investigate.

Scope¶

In scope: ogur/engine/*.py and tests/unit/engine/test_*.py
Out of scope: sources, store, API, frontend. Covered by other reviews.

How it differs from existing review surfaces¶

Skill	Bias	Best for
`/code-review` (built-in)	Reports findings on confidence	Quick PR review across any code
`/code-review --effort ultra`	Multi-agent cloud review; broader coverage	Big diffs or pre-merge confidence
`/review-engine` (this skill)	Refuses to report unproven findings	Engine-touching PRs where silent failure is the cost model

They compose: run /review-engine first to get the dense-evidence findings, then /code-review for broader coverage if appetite exists. The two will overlap on confirmed bugs and disagree on weak signals — by design.

When to invoke¶

Before requesting review on any PR that touches ogur/engine/
After a refactor under ogur/engine/ even if the PR is "no functional change" (refactors silently break invariants)
When CI greens but you don't trust the change yet — the gate runs lint + tests, this skill checks the contracts the tests don't cover

Future versions¶

v1 covers the engine. Planned expansions if the noise-control payoff holds:

v2 — add ogur/sources/*.py (10 source modules, well-bounded by content_hash and the Pydantic Signal contract)
v3 — /review-frontend companion skill once the frontend has a similar "silent failure" surface (likely around ConfidenceBadge and SourceChip contracts)
Eventually pair with /adversarial-review (Workstream C) for the highest-stakes prompt and schema changes.

Implementing rule of thumb¶

When in doubt about whether to ship a finding: try to write the failing test first. If the test doesn't fail, the finding wasn't real. If it does fail, ship it with the proof attached. This is the whole skill, distilled to one line.

/review-engine skill — contract¶