Skip to content

/review-engine skill — contract

Project-local Claude Code skill at .claude/skills/review-engine/SKILL.md. Invoked with /review-engine on a branch with engine changes.

Why this exists

The OGUR intelligence engine (ogur/engine/) is the silent-failure surface of the product. A wrong prompt produces plausible output for the wrong indication. A schema mismatch only fails at the verifier seam. A patch-where-used violation passes type-checking and lints clean but breaks test isolation.

Standard code review (including /code-review and /code-review --effort ultra) reports findings on confidence. Most reviewers — human or LLM — over-report because the cost of false positives feels lower than the cost of missing a bug. In practice, false positives are the more expensive failure mode: they burn the author's attention on non-bugs and erode trust in future findings.

/review-engine inverts the bias. A finding is reportable only if it comes with a runnable artifact that fails. Suspicions without proof are downgraded to questions for the human reader. The output is dense with evidence, not broad with opinion.

The contract

Category Required artifact Example
A — Engine/data correctness A failing pytest using the factories in tests/conftest.py make_signal(...), make_drug_profile(...), make_kiq(...), make_landscape(...)
B — LLM-stage contract A model output that validate_kiq_answers or validate_entity_references rejects, or a Pydantic schema validation error ogur.engine.verification.validate_kiq_answers(answers, expected)
C — Invariant violations A DB state that violates content_hash uniqueness, DrugSynonym normalization, DetectedChange value-object, or patch-where-used StaticPool in-memory SQLite via the patch_db fixture

If none of these can be produced, the finding goes in a "Questions for the human" section, phrased as a question. Reader can decide whether to investigate.

Scope

  • In scope: ogur/engine/*.py and tests/unit/engine/test_*.py
  • Out of scope: sources, store, API, frontend. Covered by other reviews.

How it differs from existing review surfaces

Skill Bias Best for
/code-review (built-in) Reports findings on confidence Quick PR review across any code
/code-review --effort ultra Multi-agent cloud review; broader coverage Big diffs or pre-merge confidence
/review-engine (this skill) Refuses to report unproven findings Engine-touching PRs where silent failure is the cost model

They compose: run /review-engine first to get the dense-evidence findings, then /code-review for broader coverage if appetite exists. The two will overlap on confirmed bugs and disagree on weak signals — by design.

When to invoke

  • Before requesting review on any PR that touches ogur/engine/
  • After a refactor under ogur/engine/ even if the PR is "no functional change" (refactors silently break invariants)
  • When CI greens but you don't trust the change yet — the gate runs lint + tests, this skill checks the contracts the tests don't cover

Future versions

v1 covers the engine. Planned expansions if the noise-control payoff holds:

  • v2 — add ogur/sources/*.py (10 source modules, well-bounded by content_hash and the Pydantic Signal contract)
  • v3 — /review-frontend companion skill once the frontend has a similar "silent failure" surface (likely around ConfidenceBadge and SourceChip contracts)
  • Eventually pair with /adversarial-review (Workstream C) for the highest-stakes prompt and schema changes.

Implementing rule of thumb

When in doubt about whether to ship a finding: try to write the failing test first. If the test doesn't fail, the finding wasn't real. If it does fail, ship it with the proof attached. This is the whole skill, distilled to one line.