ADR-0084: Monitoring-Alert Disposition Lifecycle
Date: 2026-07-03
Status: Accepted (implemented 2026-07-03, plan 2026-07-03-amla-w2-alert-disposition)
Deciders: Adrian (Soft4U), Claude Fable 5 (AMLA remediation design session)
Revision note (implementation, 2026-07-03): implemented as
backend/app/services/monitoring_alert_service.py, class MonitoringAlertService — NOT
alert_service.py as originally drafted below (§3). app/services/alert_service.py
pre-exists as the ADR-0025 cross-case pattern-alert service and is untouched by this ADR. See
the architecture-doc amendment (2026-07-03) in the W2 plan, Task 1.
Decision context:
- Latency: disposition transitions are single-row UPDATE + one
audit_eventsINSERT inside one tenant-scoped session — same cost profile as a suppression-rule write (ADR-0045). Alert-queue listing adds indexed filters (status,assigned_to_user_id,due_at) to a table that is tiny until W1 starts writing it. Not measured because the table currently has zero rows in every environment (no writers exist). - Dependency surface: no new packages. Two new enums in
trustrelay-models, one Alembic migration wideningmonitoring_alerts, one new service (alert_service.py), four disposition endpoints, one newPermissionmember. SAR linkage reuses the existingSARService.raise_sar(backend/app/services/sar_service.py:260). - Debuggability: every transition lands as an immutable
audit_eventsrow (ADR-0064) carrying actor, from/to status, rationale, and evidence refs; closure reason is a typed enum, so MI counts (time-to-close, backlog aging, closure-reason mix) are a GROUP BY, not log archaeology. - Reversibility: additive migration on an orphaned table; branch revert restores today's read-only surface. Nothing in the Temporal workflow changes. The one hard-to-reverse element is the audit trail itself, which is append-only by design.
- Blast radius: the alerts UI (which today polls a permanently empty table) gains disposition
actions; W1's
trigger_router_servicebecomes the writer; W3 (relationship lifecycle) and W6 (case-pack monitoring appendix) consume dispositions. Onboarding decision paths are untouched. - Alternative considered: dropping the dead
monitoring_alertstable and opening a case per detection — rejected below (a full 12-step case per sanctions-list ping is response-inflation, and AMLA Guideline 2 asks for triage before escalation, not instead of it).
Context
AMLA's draft ongoing-monitoring guidelines (level-3 elaboration of AMLR (EU) 2024/1624 Art. 26; consultation closes 2026-09-03, comply-or-explain via NCAs from Q1 2027) require, per Guideline 2, a controlled process to assess, prioritise, escalate, close and evidence monitoring outputs. AMLA's framing at the 2026-07-02 hearing was blunt: what happens after an alert is generated is the effectiveness test — alert volume proves nothing; disposition discipline does.
The 2026-07-03 gap audit (docs/research/2026-07-03-amla-ongoing-monitoring-gap-analysis.md,
§3.1 "Alert lifecycle", §3.2.2/§3.2.3) verified the current state with file:line evidence:
monitoring_alertsis an orphaned table. The ORM model exists (backend/app/db/models.py:3102-3124:MonitoringAlertwithtrigger_type, risk-score deltas,statusdefaulting to'new'at :3121, and bareacknowledged_at/acknowledged_bycolumns) and has live readers —monitoring_service.py:286-290lists by status, :318-319 countsstatus == "new"for the dashboard badge — but zero writers anywhere in the codebase. The alerts UI polls a permanently empty table.- The only disposition primitive in the monitoring domain is a bare boolean.
MonitoringService.acknowledge_event(backend/app/services/monitoring_service.py:83-108) setsacknowledged = Truewith no rationale, no evidence, and — despite the ADR-0064 infrastructure being available — noaudit_eventsrow. The endpoint (backend/app/api/monitoring.py:157-178) takes no body at all. There is no priority, no SLA, no assignment, no RBAC permission, and no path from a monitoring finding to a SAR: a CRITICAL sanctions hit on a monitored UBO becomes a list row an officer can silently tick away. - The repo already contains a production-verified disposition shape — the ADR-0045
sanctions false-positive suppression workflow (
backend/app/api/sanctions_suppression.py:65-239): mandatory rationale ≥10 chars +evidence_refson create (:65-69, per EU AI Act Art. 13 traceability), 12-month expiry forcing officer re-review (sanctions_suppression_service.py:43DEFAULT_RULE_LIFETIME, :47RENEWAL_WINDOW), revoke-with-reason (:72-73, :189-222), and fire-count/housekeeping telemetry (:225-239). The gap audit's Tier 1.3 recommendation is to generalise exactly this shape ontomonitoring_alerts.
This ADR is Wave 2 (amla-w2-alert-disposition) of the AMLA remediation architecture
(docs/superpowers/specs/2026-07-03-amla-remediation-architecture.md, §3 W2). It depends on
W1 (ADR-0083), which turns the trigger taxonomy into detection code and makes
trigger_router_service.py the first writer of monitoring_alerts. Without W2, W1's alerts
would reproduce the exact defect AMLA names: detections dying in a list. The design principle
binding both (architecture §1.2): Detection → Response — every detection routes somewhere
typed, and every response leaves an immutable record.
Decision
Generalise the ADR-0045 suppression disposition shape onto monitoring_alerts, giving every
monitoring alert a typed lifecycle with mandatory evidenced closure, assignment, SLA/aging,
an RBAC-gated permission, a SAR link, and MI counts. Vocabulary, columns, service, and
endpoints are exactly those fixed in the architecture document §2 — verbatim, no synonyms.
1. Lifecycle enums (architecture §2.1), added to
packages/trustrelay-models/src/trustrelay_models/monitoring.py (beside the existing
MonitoringCheckType, :20):
class AlertStatus(str, Enum):
new = "new" # matches the existing server_default 'new' (db/models.py:3121)
triaged = "triaged"
escalated = "escalated"
closed = "closed"
class AlertClosureReason(str, Enum):
resolved = "resolved"
false_positive = "false_positive"
escalated_sar = "escalated_sar"
review_opened = "review_opened"
duplicate = "duplicate"
Legal transitions: new → triaged → escalated → closed, plus new → closed and
triaged → closed (not every alert warrants escalation). closed is terminal — a wrongly
closed alert is not reopened in place; the underlying condition re-fires a new alert
(same append-only philosophy as ADR-0045 revoke: the record of the wrong decision stays).
No transition may skip the mandatory closure fields.
2. Schema (architecture §2.3): the next sequential Alembic revision (numbered at
implementation time — never hardcoded in the plan) widens monitoring_alerts with the W2
lifecycle columns: response_required (TriggerResponse, ADR-0083), priority int,
assigned_to_user_id UUID, due_at, closed_at, closed_by, closure_reason,
closure_rationale text, evidence_refs JSONB, sar_id, review_case_id,
source_event_id (back-ref to the originating monitoring_events row). trigger_type
already exists (:3116). The migration also enables RLS on monitoring_alerts in line with
ADR-0023/0050; every INSERT sets tenant_id explicitly (the PR #177 confirm-website lesson —
the existing server_default at :3110-3113 pins the default tenant and violates WITH CHECK
for every other tenant, so writers must never rely on it). Any raw-SQL JSONB parameter uses
CAST(:param AS jsonb), never ::jsonb (asyncpg).
3. Disposition service — new backend/app/services/alert_service.py (architecture §2.4):
assign(alert_id, user_id)— assignment via the PR #153 officer-picker population (appuserstable).triage(alert_id, priority, note)—new → triaged, setspriorityanddue_at.escalate(alert_id, target, rationale)—→ escalated; when target is SAR, callsSARService.raise_sar(sar_service.py:260, exposed atPOST /cases/{case_id}/sar,backend/app/api/sar.py:88-118) to pre-populate a draft SAR carrying analert_idback-ref, and stores the returnedsar_idon the alert. The SAR then follows its own ADR-0071 lifecycle (MLRO four-eyes, tipping-off boundary) — this ADR adds the missing link, not a parallel filing path. When target is a review case (W1 routing),review_case_idis stored instead.close(alert_id, closure_reason, closure_rationale, evidence_refs)— rationale mandatory ≥10 chars (the ADR-0045 floor,api/sanctions_suppression.py:68), typed reason mandatory. Closing asfalse_positiverequires at least one evidence ref; closing asescalated_sar/review_openedrequires the correspondingsar_id/review_case_idto be set — fail-closed: the service rejects a closure that claims an escalation it cannot point at.- SLA/aging:
due_atderived from severity at write time (W1) or triage; the alert-queue endpoint computes overdue; MI counts expose backlog aging. Reuses the SLA vocabulary, not the case SLA rows. - Every transition writes an immutable
audit_eventsrow (ADR-0064) — event typesalert_assigned,alert_triaged,alert_escalated,alert_closed— carrying actor id, from/to status, rationale, evidence refs, and linked ids. The disposition columns on the alert row are the queryable state;audit_eventsis the evidence.
4. API surface (architecture §2.5): POST /api/monitoring/alerts/{id}/assign · /triage
· /escalate · /close; GET /api/monitoring/alerts gains status / assigned / overdue
filters; a MI-counts endpoint returns time-to-close, backlog aging buckets, and
closure-reason mix per tenant (the supervisor-facing "effectiveness of disposition" numbers;
W6 renders them into the Monitoring Framework Record). All disposition endpoints are gated by
a new Permission.MONITORING_DISPOSE = "monitoring.dispose" in
backend/app/api/deps/permissions.py:42-98, granted at officer level and inherited upward
through the ADR-0074 strict-superset hierarchy (_OFFICER set, :70-77); auditor stays
read-only. RBAC Phase 2 is active (2026-06-28), so denial is a real 403.
5. UI: the existing alerts surface becomes an alert-queue tab with disposition actions — inline confirmation + Sonner toasts, no modal dialogs (S4U UI standard). Until W1 ships its writer, the W0 honest empty-state ("no alert engine events yet") stands; this ADR never fakes rows.
6. Boundary with the event acknowledge fix: W0 (architecture §3 W0 item b) separately
fixes acknowledge_event on monitoring_events (mandatory rationale + audit event). That
remains a lightweight informational-tier primitive; the full lifecycle in this ADR applies to
monitoring_alerts, the response spine. record_only routing (ADR-0083) still creates an
alert row so nothing bypasses the disposition trail.
Consequences
Positive
- Closes the AMLA Guideline 2 blocker verbatim: monitoring outputs are assessed (triage), prioritised (priority/due_at), escalated (SAR/review-case links), closed (typed reason + mandatory rationale), and evidenced (immutable audit rows + evidence_refs) — the comply-or-explain artifact an NCA will ask a tenant for.
- The monitoring→SAR gap (§3.1 "SAR never linked from a monitoring finding") closes by reusing the production ADR-0071 lifecycle rather than inventing a second filing path; the alert back-ref makes the chain detection→alert→SAR reconstructable end-to-end.
- Proven shape, not a novel design: rationale floor, evidence refs, revoke/close-with-reason and telemetry are all lifted from ADR-0045 code that has survived adversarial review.
- MI counts turn "we monitor" into measurable numbers (time-to-close, aging, reason mix) — AMLA's effectiveness-over-volume posture, and W6's framework record gets real data.
Negative
- Officer workload becomes visible and mandatory: every alert now demands a typed, evidenced
disposition. A tenant with noisy W1 detection will face a real backlog queue where today
they see a comfortable empty list — that is the point, but it is friction, and until
detection precision is tuned it may push officers toward rote
false_positiveclosures (mitigated, not eliminated, by the evidence-ref requirement and reason-mix MI). closedbeing terminal means an erroneous closure cannot be amended in place; the correction path (condition re-fires a new alert) depends on W1's detectors actually re-firing, and a one-shot trigger wrongly closed leaves only the audit trail as recourse.- Two disposition primitives now coexist (event acknowledge-with-rationale from W0 vs the full alert lifecycle), which requires the UI and officer training to keep the informational vs actionable distinction crisp.
Neutral
- The table keeps its legacy
acknowledged_at/acknowledged_bycolumns (:3123-3124) unused by the new lifecycle; they are not dropped in W2 (nothing ever wrote them, dropping is cosmetic and can ride any later migration). - MI counts are per-tenant operational telemetry; no cross-tenant benchmarking is introduced (Pillar 6 was dropped — officers reject data sharing).
- W3 consumes
escalatedalerts for suspension recommendations and W6 renders dispositions into the case-pack monitoring appendix; both are additive consumers, no W2 rework expected.
Alternatives Considered
Alternative 1: delete the dead table; open a case per alert
Drop monitoring_alerts (zero writers, so no data loss) and route every detection straight
into a ComplianceCaseWorkflow review case. Rejected: it collapses AMLA's assess/prioritise
step into escalate — a sanctions-list update touching a monitored entity is not yet a
finding, and spawning a 12-step case per ping is response-inflation that would bury officers
and dilute real escalations. The architecture (§2.1 TriggerResponse) deliberately reserves
case-opening for full_kyc_refresh/targeted_update routing; record_only and triage-first
paths need a lighter, still-evidenced object. The table also already has live read surfaces
(monitoring_service.py:286-319, dashboard badge, UI) that wiring preserves and deleting
wastes.
Alternative 2: reuse follow-up tasks as alert triage
Represent monitoring outputs as generated follow-up tasks on the originating case (the
generate_follow_up_tasks machinery exists). Rejected: follow-up tasks are case-iteration
artifacts inside the onboarding loop — they carry no status machine beyond completion, no
closure reason, no SLA/aging, no SAR linkage, and no tenant-level queue across cases; and
post-approval the case workflow is terminal, so there is no iteration to attach them to
(gap doc §3.1 "an APPROVED relationship is frozen forever"). Bending them into a disposition
lifecycle would rebuild everything in this ADR inside a model shaped for something else.
Alternative 3: boolean-acknowledge++ (mandatory rationale, no lifecycle)
Extend the W0 acknowledge fix to alerts: keep new → acknowledged but require a rationale and
emit an audit event. Rejected as the terminal state, kept as the W0 stopgap for events: it
satisfies "evidence" but not "assess, prioritise, escalate, close" — no priority, no aging, no
assignment, no typed closure reason (so no MI reason-mix), and critically no SAR/review link,
leaving the detection→response chain broken exactly where the audit found it
(§3.1: "a CRITICAL sanctions hit becomes a list row"). AMLA's poor-practice example is
precisely alert handling that ends at acknowledgement.
Alternative 4: do nothing
Viable only until W1 ships — today the table is empty, so there is nothing to dispose.
Rejected because W1 (ADR-0083) makes trigger_router_service a writer; shipping detection
without disposition would manufacture the exact "detections die in a list" defect at larger
scale, and the comply-or-explain clock (final guidelines Q4 2026, NCA declarations Q1 2027)
runs regardless.