/review-engine skill — contract¶
Project-local Claude Code skill at .claude/skills/review-engine/SKILL.md. Invoked with /review-engine on a branch with engine changes.
Why this exists¶
The OGUR intelligence engine (ogur/engine/) is the silent-failure surface of the product. A wrong prompt produces plausible output for the wrong indication. A schema mismatch only fails at the verifier seam. A patch-where-used violation passes type-checking and lints clean but breaks test isolation.
Standard code review (including /code-review and /code-review --effort ultra) reports findings on confidence. Most reviewers — human or LLM — over-report because the cost of false positives feels lower than the cost of missing a bug. In practice, false positives are the more expensive failure mode: they burn the author's attention on non-bugs and erode trust in future findings.
/review-engine inverts the bias. A finding is reportable only if it comes with a runnable artifact that fails. Suspicions without proof are downgraded to questions for the human reader. The output is dense with evidence, not broad with opinion.
The contract¶
| Category | Required artifact | Example |
|---|---|---|
| A — Engine/data correctness | A failing pytest using the factories in tests/conftest.py |
make_signal(...), make_drug_profile(...), make_kiq(...), make_landscape(...) |
| B — LLM-stage contract | A model output that validate_kiq_answers or validate_entity_references rejects, or a Pydantic schema validation error |
ogur.engine.verification.validate_kiq_answers(answers, expected) |
| C — Invariant violations | A DB state that violates content_hash uniqueness, DrugSynonym normalization, DetectedChange value-object, or patch-where-used |
StaticPool in-memory SQLite via the patch_db fixture |
If none of these can be produced, the finding goes in a "Questions for the human" section, phrased as a question. Reader can decide whether to investigate.
Scope¶
- In scope:
ogur/engine/*.pyandtests/unit/engine/test_*.py - Out of scope: sources, store, API, frontend. Covered by other reviews.
How it differs from existing review surfaces¶
| Skill | Bias | Best for |
|---|---|---|
/code-review (built-in) |
Reports findings on confidence | Quick PR review across any code |
/code-review --effort ultra |
Multi-agent cloud review; broader coverage | Big diffs or pre-merge confidence |
/review-engine (this skill) |
Refuses to report unproven findings | Engine-touching PRs where silent failure is the cost model |
They compose: run /review-engine first to get the dense-evidence findings, then /code-review for broader coverage if appetite exists. The two will overlap on confirmed bugs and disagree on weak signals — by design.
When to invoke¶
- Before requesting review on any PR that touches
ogur/engine/ - After a refactor under
ogur/engine/even if the PR is "no functional change" (refactors silently break invariants) - When CI greens but you don't trust the change yet — the gate runs lint + tests, this skill checks the contracts the tests don't cover
Future versions¶
v1 covers the engine. Planned expansions if the noise-control payoff holds:
- v2 — add
ogur/sources/*.py(10 source modules, well-bounded bycontent_hashand the Pydantic Signal contract) - v3 —
/review-frontendcompanion skill once the frontend has a similar "silent failure" surface (likely aroundConfidenceBadgeandSourceChipcontracts) - Eventually pair with
/adversarial-review(Workstream C) for the highest-stakes prompt and schema changes.
Implementing rule of thumb¶
When in doubt about whether to ship a finding: try to write the failing test first. If the test doesn't fail, the finding wasn't real. If it does fail, ship it with the proof attached. This is the whole skill, distilled to one line.