ADR-0051: Tenant-scoped session selection for RLS-enforced access
Status: Accepted Date: 2026-06-13 Supersedes: none Superseded by: none
Context
Once RLS was actually enforced (ADR-0050), a 9-agent audit (§10 of the internal
gaps register) classified all 119 get_session() call-sites: 14 P0 writes
that the RLS WITH CHECK would reject, 35 P1 reads that returned the demo
tenant's rows for a non-demo request, 16 P2 shared-reference reads that work
via demo-seeding, 4 ambiguous, and 42 genuinely safe. get_session()
pins the GUC to the demo tenant — correct for the single-tenant demo, wrong for
any real second tenant.
The flip also exposed two latent bugs invisible to the prior (mocked-DB) suite:
DiagnosticService.record_stagecast a VARCHARcase_idtouuid; every real INSERT raisedinvalid input syntax for type uuid, the guard-and-swallow ate it, andpipeline_stageshad 0 rows in production.agent_progress/tool_audit/automation_tierwrites hardcodedDEMO_TENANT_ID(or relied on a demoserver_default), collapsing every tenant's rows onto the demo tenant.
Decision
Select the session factory by where the tenant comes from, not by habit:
- User-asserted tenant (officer JWT, activity
input.tenant_id):get_tenant_session(tenant)— RLSWITH CHECKgives defense-in-depth on writes; reads are correctly scoped. Threadtenant_idthrough the call chain when needed. - System-derived tenant (the row's tenant is its case's tenant):
derive it from the authoritative row under
get_admin_session()—INSERT … SELECT c.case_id, …, c.tenant_id FROM cases c WHERE c.case_id = :idwhen the case must exist, orCOALESCE((SELECT tenant_id FROM cases WHERE case_id = :case_id_lookup), demo)for case-optional events. Use a separate bind for the subquery — reusing one:case_idacross an INSERT column and a subquery predicate makes asyncpg raiseAmbiguousParameterError(text vs character varying). - System read by a globally-unique key where the caller pre-verifies
ownership (workflow_id, capsule_id, case_id), or a genuinely shared/global
read (regulatory KB, cross-case cost aggregate):
get_admin_session(). - Genuinely safe (non-RLS tables like
tenants/reasoning_templates, or a policy that already admitstenant_id IS NULLglobals likecost_pricing): leaveget_session()— verified, not assumed.
case_id columns are VARCHAR (values like CASE-AWDC-…), never UUID;
cases.id is the UUID PK. Never cast case_id to uuid.
Consequences
- All P0 writes and P1 reads are remediated across ~30 modules in three batches;
the P2 tail is resolved (most verified safe, the rest moved to admin or
threaded). The remediation landed on
fix/code-review-2026-06-11(PR #21). - Testing rule (name-the-oracle): any SQL path a mock would otherwise stand
in for ships a marker-gated real-DB test that seeds a non-demo tenant
and asserts persistence + correct tenant attribution + RLS read isolation
(
test_diagnostic_rls_persistence.py,test_agent_progress_rls_persistence.py). A mocked-DB test plus a guard-and-swallowexceptis an information black hole. - Reports now surface
governance_decisions/evoi_decisions(the persistence fix unblocked them), and restriction decisions carryevidence_refsprovenance.