ADR-0089: Deterministic compliance scoring + entity-risk one-way ratchet + fail-closed re-screen

Date: 2026-07-03 Status: Accepted Deciders: Adrian (Soft4U), Claude Opus 4.8

Related: ADR-0020 (EBA weighted-max risk matrix), ADR-0066 (live risk-paced UBO re-screening / entity baselines), ADR-0067 (fail-closed / never report clear — the "not assessed" contract), ADR-0070 (maker-checker / four-eyes), ADR-0075 (document-content adverse analysis — docs can RAISE risk), ADR-0082 (post-validation calibration & integrity hardening — the recalc never-suppress guard, the same class of defect at the case layer), PR #176 (display-risk one-way ratchet reconcile_display_risk)

ADR-number note: the design (docs/superpowers/specs/2026-07-03-determinism-risk-ratchet-design.md §7) and plan reserved "ADR-0083" as a placeholder; between authoring and implementation the AMLA monitoring-framework work landed and consumed ADR-0083 through ADR-0088. Per the spec's explicit instruction ("reserve next free number at authoring time"), this decision is recorded as ADR-0089 — the next free number — to keep the register append-only and non-colliding.

Context

A live OB Holding 1 OÜ (Estonian PSP, registrikood 14975047) re-run assessed the same legal entity on identical inputs as CRITICAL/90 on one run (case_d9d389cf06bb, 4 criminal/EPPO findings surfaced → composite floored to 90) and medium/51 on another (case_a69b80f57d51, 1 finding, too weak to floor). For a compliance engine that is unacceptable on its own; worse, the lower assessment silently overwrote the entity's persisted baseline.

Root causes (evidence-audited, systematic-debugging Phase 1–2):

Baseline upsert blindly overwrites. entity_baseline_service.upsert_baseline used an on_conflict_do_update that set latest_risk_score/latest_risk_tier unconditionally — no floor, no comparison to the established value. A lower-recall re-screen silently downgraded an entity's persistent risk. This directly violates the project's cardinal principle (mission memory, ADR-0067): the system may ADD scrutiny but must NEVER suppress a risk signal.
Scoring agents sample. synthesis_agent ran at temperature=0.1 (mcc=0.1, case_intelligence=0.2, memo=0.3, finding_debugger=0.1, task_generator=0.1, belgian=0.1). Even on identical evidence the score could drift across runs.
No fail-closed on a retrieval gap. When adverse-media/SERP data-gaps (the criminal article is not retrieved) the score dropped without a hard floor or a prominent "not assessed" signal — a false-negative on a criminal finding presented as a confident medium.

The inherent limit is honest: raw live-web retrieval recall (BrightData/Tavily/Google) cannot be made deterministic. This decision does not pretend otherwise — it removes the two axes that have no excuse (LLM sampling, blind persistence) and makes retrieval gaps fail closed so a recall miss can never lower an entity's risk.

Decision

Enforce one invariant:

Per-entity risk is monotonic absent an explicit, audited downgrade. A re-screen may raise or maintain an entity's risk; it may never silently lower it. The only path down is an officer decision through maker-checker (ADR-0070), recorded in audit_events.

Five components enforce it:

A — Deterministic scoring (temperature=0). Pin every agent whose output is persisted or shown to a regulator to temperature=0 (greedy decoding): synthesis_agent, mcc_classifier, case_intelligence_agent, memo_justification_agent, finding_debugger, task_generator, belgian_agent. dashboard_agent stays at 0.7 (interactive chat UX, no persisted verdict). A guard unit test asserts each compliance-output agent's ModelSettings.temperature == 0 so a future edit cannot silently re-introduce sampling. Anthropic/OpenAI honour 0 as greedy decoding; that is sufficient — the residual is documented, not hidden.

B — Entity-risk one-way ratchet (core). Mirror reconcile_display_risk (PR #176, which ratchets a single case's display risk up) at the persistent baseline layer. A new pure function reconcile_baseline_risk(established, incoming) -> BaselineDecision compares tier rank (critical>high>medium>low>clear) then score: on raise-or-maintain the incoming value becomes effective; on a would-be downgrade the established value is held, the incoming raw run is recorded in last_run_risk_*, and a divergence_state payload is stored (pending audited downgrade). upsert_baseline calls this before writing; the on_conflict_do_update set_ applies the reconciled decision, never the raw incoming values. next_review is computed from the effective tier — a downgrade can never loosen review cadence either. The existing latest_risk_score/latest_risk_tier columns are the effective/established value (one source of truth, no reader migration).

C — Material-findings persistence & re-injection. (Component; scheduled for a later task in this plan.) Material findings (criminal/enforcement/sanctions/adverse_media/freeze/ regulatory_action) are fingerprinted (stable hash over type + subject + normalized claim), persisted on the baseline (established_findings, union-only, never subtract) and re-injected into a re-screen that misses them, so the established floor re-applies.

D — Fail-closed on material-check data-gap. (Component; later task.) A data-gapped material check sets a structured material_check_incomplete flag, penalises confidence, renders the ADR-0067 "not assessed" banner, and hard-floors the reconciled risk — a data-gap can never enable a downgrade.

E — Audited downgrade path. (Component; later task.) The only way risk goes down: a MonitoringAlert(trigger_type=risk_divergence, priority=high) is raised on Component-B divergence (append-only enum member, monitoring_alerts.trigger_type is String(50) — no DB migration for the member). Lowering the baseline requires an explicit officer decision routed through maker-checker (ADR-0070): a second, different approver; audit_events records risk_downgrade_approved with maker, checker, and reason.

Schema (Alembic migration 080, additive, down_revision 079). Five columns on entity_baselines, all nullable/defaulted (safe online, no backfill — latest_* already holds the effective value): last_run_risk_score (Integer), last_run_risk_tier (String), established_findings (JSONB, default []), divergence_state (JSONB, nullable), material_check_incomplete (Boolean, default false).

Decision context:

Latency: temperature=0 changes decoding, not call cost — no measurable p50/p95 change. The ratchet adds one indexed SELECT on the baseline identity per upsert_baseline (already inside the same tenant session) — sub-millisecond, off the request hot path (runs in the Temporal activity).
Dependency surface: zero new packages. Reuses pydantic-ai ModelSettings, SQLAlchemy async, the existing MonitoringAlertService and maker_checker modules.
Debuggability: a downgrade attempt leaves a divergence_state row + a RISK_DIVERGENCE alert + (on approval) an immutable audit_events entry — a full forensic trail of every attempted and every approved downgrade, at 3am, from SQL alone. The last_run_* columns show the raw run that diverged from the held floor.
Reversibility: Component A is a one-line-per-agent config flip. Component B is additive columns + a pure function wired into one write path; the migration downgrade() drops the 5 columns. No data is destroyed by rollback (the effective value was always in latest_*).
Blast radius: additive. Every existing reader of latest_risk_score/tier keeps working unchanged and simply gains the ratchet guarantee. The only behavioural change is that a re-screen can no longer lower a persisted baseline without audit.
Alternative considered: score-floor-only (no findings persistence) — rejected because the actual miss is a dropped finding, not just a lower number; a coarse floor would hold the tier but lose the specific EPPO finding an officer must see (Component C addresses this).

Consequences

Positive

The CRITICAL→medium flip on identical inputs becomes structurally impossible: a re-screen can only surface same-or-higher risk, or raise an auditable divergence for a two-eyes human downgrade.
Scoring is deterministic (temp 0): identical evidence → identical score/tier, so calibration regressions are reproducible instead of intermittent.
One source of truth for "the entity's risk" (latest_* = effective) — no reader migration, no second established_* column to keep in sync.
Every downgrade is attributable (maker + checker + reason in immutable audit_events), satisfying AMLR 5-yr audit-trail and EU AI Act Art. 12 logging obligations.

Negative

Over-scrutiny bias. A held floor means a genuinely improved entity keeps its higher tier until an officer clears the divergence — deliberate (over-scrutiny is safe; silent under-scrutiny is the regulatory failure) but it adds officer workload (a RISK_DIVERGENCE task per downward re-screen).
Downgrade friction. Lowering risk now requires two distinct approvers; a single officer can no longer correct a stale-high baseline alone.
Extra columns + write complexity. Five new columns and a read-before-write in the upsert path; upsert_baseline is no longer a pure blind insert.
temperature=0 is not bit-level determinism across provider infra/model-version changes — it removes sampling, not every source of drift. Documented as a residual, not eliminated.

Neutral

dashboard_agent is intentionally excluded (interactive chat, no persisted/regulator-facing verdict) and stays at 0.7.
Components C/D/E are recorded here as the committed design but land in later tasks of the same plan; A and B land first.

Alternatives Considered

Alternative 1: Do nothing (accept the divergence as retrieval noise)

Treat the CRITICAL↔medium flip as an inherent property of non-deterministic web retrieval and document it as a known limitation.
Why rejected: two of the three root causes (LLM sampling, blind baseline overwrite) are not retrieval noise — they are fixable engine defects. Leaving a lower-recall run to silently overwrite a CRITICAL baseline is a direct, repeatable violation of the never-suppress principle and an AML false-negative.

Alternative 2: Make retrieval deterministic (cache/pin the web corpus)

Snapshot every source and replay it so re-screens are byte-identical.
Why rejected: impossible in practice and wrong for perpetual KYC — the whole point of a re-screen is to observe new real-world signals. A frozen corpus would defeat monitoring. The honest fix is persistence + fail-closed, not fake determinism.

Alternative 3: Score-floor only (ratchet the number, drop finding persistence)

Hold latest_risk_score at its max but re-derive findings fresh each run.
Why rejected: the floor holds the tier but the specific material finding (the EPPO criminal investigation) still vanishes from the re-screen's finding list, so an officer reading the latest run sees a medium-looking finding set behind a critical number — an internal contradiction. Component C (persist + re-inject the finding itself) is required for a coherent, regulator-defensible record.

Context​

Decision​

Consequences​

Positive​

Negative​

Neutral​

Alternatives Considered​

Alternative 1: Do nothing (accept the divergence as retrieval noise)​

Alternative 2: Make retrieval deterministic (cache/pin the web corpus)​

Alternative 3: Score-floor only (ratchet the number, drop finding persistence)​