ADR-0036: PII Classification with Hybrid Annotations and Generated Manifest
Status: Accepted Date: 2026-04-07
Context
Trust Relay stores PII fields across 8+ tables -- all in plaintext PostgreSQL columns. While Row-Level Security (ADR-0023) enforces tenant isolation at the database level, there is no field-level awareness of what data is personally identifiable. This creates three gaps:
- Security questionnaires -- regulated customers ask "where is PII stored and how is it protected?" and there is no machine-readable answer.
- GDPR Art. 30 -- records of processing activities require a formal inventory of personal data fields, their purposes, and retention periods.
- Encryption at rest -- direct identifiers (national IDs, IBANs, document numbers) are stored in plaintext, which is a finding in any security audit.
Decision
Implement a hybrid annotation + generated manifest architecture in three phases:
Phase 1 (this ADR): PII classification metadata on SQLAlchemy columns via info={"pii": PIIField(...)}, a PIIRegistry that scans models at runtime, and a CLI that generates docs/pii_manifest.json -- the GDPR Art. 30 artifact.
Phase 2: Alembic migration to convert scalar PII columns to EncryptedText (AES-256-GCM via TypeDecorator), with HMAC search hashes for indexed lookups.
Phase 3: GDPR data subject request endpoints (access, erasure, rectification) using a person_data_index table for cross-case person resolution.
Key design choices:
-
Annotations on models, not a separate registry file -- PII classification lives next to the column definition, making it hard to miss when adding new fields.
-
Six PII categories --
DIRECT_IDENTIFIER,QUASI_IDENTIFIER,SENSITIVE,FINANCIAL,CONTACT,DOCUMENT. Three of these (DIRECT_IDENTIFIER, FINANCIAL, CONTACT) require column-level encryption. -
Vault-ready
KeyProviderinterface --EnvKeyProviderreads AES-256 keys from environment variables. The abstraction allows swapping to HashiCorp Vault or AWS KMS without changing the encryption layer. -
Generated manifest, not hand-maintained --
python -m app.pii.manifestscans all annotated models and producesdocs/pii_manifest.json. -
Dev/test bypass --
pii_encryption_enabled=FalsemakesEncryptedTextbehave as plainText, avoiding key management overhead in development.
Consequences
Positive
- Bank security questionnaires can be answered with a single JSON file
- GDPR Art. 30 records of processing generated from code, not maintained manually
- Encryption architecture is Vault-ready from day one
- Zero impact on existing queries --
EncryptedTextis transparent to application code
Negative
- JSONB fields containing PII arrays cannot use column-level
EncryptedText-- require application-level encryption in Phase 2 - Dev/test bypass means plaintext in non-production databases
Risks
info={}metadata is not enforced by SQLAlchemy -- a developer could add a PII column without the annotation. Mitigated by CI drift detection in Phase 2.EnvKeyProviderstores the key in an env var which may appear in process listings. Production deployments should useVaultKeyProvider.