Testing¶
Unit tests live under tests/unit/ and mirror the ogur/ package layout. Shared fixtures in tests/conftest.py.
tests/
__init__.py
conftest.py Shared fixtures + factories
unit/
api/ /api/signals, /api/briefing, /api/ask, /health
engine/ detector, classifier, enricher, synthesizer, query,
pipeline, orchestrator, analyzers,
evidence_pipeline, verification
extractor/ entity_extractor, outcomes_extractor,
compound_span_postprocessor, ct_gov_v2_parser,
matcher, outcomes_eval
sources/ One test per source + retry/hash base tests
store/ signals, briefings, companies, targets,
drug_store, evidence_store, kiqs
scripts/ run_evidence_pipeline_cli (script smoke-tests)
Current state: 785 passing + 1 skipped (test_gliner_extraction_quality, needs the GLiNER ML stack) + 1 deselected (integration marker). make test runs in ≈ 5 seconds because every test uses in-memory SQLite and mocks Anthropic.
Running tests¶
make test # full suite
make test-v # verbose
make test-fast # stop on first failure
make test-file F=tests/unit/engine/test_detector.py
Under the hood: uv run --extra dev python -m pytest tests/. Pytest auto-discovers tests/conftest.py at any level above test files.
Integration tests¶
Any test marked @pytest.mark.integration makes live network calls and is excluded from make test. Run them explicitly:
These are for hand-verifying source contracts against real APIs. Don't rely on them in CI.
The DB fixture pattern¶
The tricky part of testing is that production code calls get_session() in several modules (detector, enricher, query, every store). For tests to share data between these calls, they must share a single connection — the default SQLAlchemy pool gives each Session its own connection to :memory:, which gives each Session its own empty database.
Solution: StaticPool on in-memory SQLite so every Session hits the same in-process DB.
# tests/conftest.py
engine = create_engine(
"sqlite:///:memory:",
connect_args={"check_same_thread": False},
poolclass=StaticPool,
)
patch_db then replaces get_session in every module that imports it, pointing them all at this shared engine. See conftest.py:58.
Patch-where-used, not where-defined¶
The load-bearing rule. Python imports the name into the calling module at import time. If detector does from ogur.store.database import get_session, then detector.get_session and ogur.store.database.get_session are two different names bound to the same function. Patching the second one doesn't affect the first.
Correct:
Wrong — test passes against the mock but prod code still hits the real DB:
Every engine and store module that calls get_session is enumerated in the patch_db fixture so tests don't have to remember the list.
Fixtures¶
| Fixture | Provides |
|---|---|
db_engine |
Fresh in-memory engine with all tables |
db_session |
Session bound to db_engine |
mock_get_session |
A contextmanager that yields sessions on db_engine |
patch_db |
Patches get_session in every engine + store module |
patch_sources_db |
Patches get_session in source modules |
api_client |
FastAPI TestClient with get_db dependency-overridden |
Factories¶
Import from tests.conftest (the absolute path still works from subdirs):
make_signal(drug_name=…, source=…, signal_type=…, severity=…, …)— default is a Merck pembrolizumab Phase 3 NSCLC trial-registered signal.make_drug_profile(normalized_name=…, phase=…, company=…, target=…, indication=…)make_briefing(landscape_id=…, executive_summary=…)make_landscape(id=…, name=…, indication=…, …)make_company_profile(normalized_name=…, display_name=…, …)make_target(normalized_name=…, display_name=…, …)make_drug_target(drug_name=…, target_name=…, …)make_kiq(landscape_id=…, question=…, time_horizon=…, priority=…, active=…, id=…)— KIQ row with sensible defaults; mirrors the others.make_anthropic_mock(response_text)— builds aMagicMockAnthropic client that returnsresponse_textfrommessages.create.
All factories accept overrides — if you need a specific severity or source, pass it as a kwarg.
API tests¶
Use the api_client fixture (TestClient with get_db overridden):
def test_list_signals(api_client, db_session):
db_session.add(make_signal(drug_name="pembrolizumab"))
db_session.commit()
response = api_client.get("/api/signals?drug_name=pembrolizumab")
assert response.status_code == 200
assert len(response.json()) == 1
Source tests¶
Use pytest-httpx to intercept HTTP without patching. The httpx_mock fixture is auto-loaded:
@pytest.mark.asyncio
async def test_source_fetches_and_normalizes(httpx_mock):
httpx_mock.add_response(
url="https://api.example.com/search",
json={"results": [...]},
)
signals = await MySource().fetch(make_landscape())
assert signals[0].source == "mysource"
Sources that hit the DB directly (e.g. PubMed looking up DrugProfiles) need patch_sources_db in addition.
LLM-calling tests¶
Use make_anthropic_mock:
def test_classifier_scores_and_filters(patch_db, db_session):
mock_client = make_anthropic_mock('{"scores":[8,3,6]}')
with patch.object(SignalClassifier, "client", mock_client):
classifier = SignalClassifier()
result = classifier.classify([change_a, change_b, change_c])
assert len(result) == 2 # scores 8 + 6 passed threshold (≥5)
The rule: we do not test prompt wording. We test that the classifier correctly parses the response, applies the ≥5 threshold, falls back on API error, and rebalances by source quota. Prompt content is a product decision, not a test surface.
What's not covered¶
- No end-to-end tests (seed → briefing → API → frontend). The pipeline test in tests/unit/engine/test_pipeline.py runs the detect/classify/enrich/synthesize chain against an in-memory DB with mocked LLMs — that's the closest we have.
- No load tests. The API is currently single-tenant, and signal volume per landscape is ~1,500 rows.
- No frontend tests. The React codebase has no test harness. This is on the backlog.
Known warnings¶
~671 DeprecationWarning: datetime.datetime.utcnow() is deprecated messages during the test run. datetime.utcnow() is scheduled for removal in a future Python version; migration to datetime.now(timezone.utc) is a separate cleanup task. Not blocking.