Investigation Confidence Scoring (Pillar 1)
Quantified certainty for every compliance investigation — replacing binary pass/fail with a 4-dimension confidence framework.
Business Value
Compliance officers need to understand not just what the AI found, but how certain it is. Confidence Scoring provides a 0-100 score decomposed into four independently measurable dimensions, enabling evidence-based decision-making.
Architecture
Confidence Dimensions
| Dimension | Range | Measures |
|---|---|---|
| Evidence Completeness | 0-25 | Coverage of required document categories |
| Source Diversity | 0-25 | Number and variety of independent sources |
| Consistency | 0-25 | Agreement between sources on key facts |
| Historical Calibration | 0-25 | Accuracy of similar past predictions |
Confidence Levels
| Level | Score Range | Action |
|---|---|---|
| HIGH | 85-100 | Automated approval eligible |
| MEDIUM | 65-84 | Standard review |
| LOW | 40-64 | Enhanced review recommended |
| INSUFFICIENT | 0-39 | Additional investigation required |
Workflow Integration
Confidence scoring is invoked through the _compute_and_store_confidence() helper method, which was extracted from the workflow's run method during the codebase hardening sweep (change I6). This helper is shared between the KYC and KYB investigation paths — both call it after their respective investigation activities complete.
# Shared for both KYC and KYB paths
await self._compute_and_store_confidence(
input, investigation_result, retry_policy
)
The helper:
- Checks the
confidence-score-v1version gate (skipped for old workflow histories) - Calls the
compute_confidence_scoreactivity with a 30-second timeout - Appends the result to
self._state.confidence_scores - Logs a
confidence_computedaudit event - Swallows all exceptions (confidence scoring is best-effort — a scoring failure never blocks case progression)
Prior to I6, confidence scoring was duplicated inline in both the KYC and KYB branches. Extracting it to _compute_and_store_confidence() eliminates the duplication and ensures both paths always score using identical logic.
Key Components
confidence_engine.py— Core scoring engine with dimensional computation. TheConfidenceScore/ConfidenceLevelPydantic models live in the sharedtrustrelay_models.confidencepackage (ADR-0037);level_from_score()applies the 85/65/40 thresholds.calibration_service.py— Feedback loop: officer decisions are recorded viarecord_data_point()and surfaced viaget_calibration_stats(), feeding the Historical Calibration dimension.quality_scorer.py— LLM-as-judge quality scoring used alongside the deterministic confidence engine.ConfidenceScoreCard.tsx— Visual breakdown in case detail view
API Endpoint
The confidence router is mounted under /api/cases (app/api/confidence.py):
| Method | Path | Description |
|---|---|---|
| GET | /api/cases/{workflow_id}/confidence | Get the latest 4-dimension confidence breakdown for a case |
Calibration is not exposed as a standalone REST surface: officer decisions feed
CalibrationService.record_data_point() internally from the decision flow, and
get_calibration_stats() supplies the Historical Calibration dimension at scoring time.
Configuration
- The confidence score is computed by the
compute_confidence_scoreTemporal activity and is best-effort: a scoring failure never blocks case progression. There is no dedicatedconfidence_scoring_enabledfeature flag — scoring runs as part of the workflow, gated by theconfidence-score-v1workflow version guard. - Alembic migration:
006_calibration_data