Skip to main content

Session-Based Investigation Diagnostics

Pipeline-level telemetry, officer quality feedback, and automated failure classification -- making every compliance investigation iteration fully reconstructable and diagnosable.

Business Value

When an investigation produces poor results -- a hallucinated finding, a missing data source, or an incomplete analysis -- the compliance officer currently has no way to understand why without digging through logs. Session diagnostics solves this by recording structured telemetry at every pipeline stage and joining it into a single reconstruction view per iteration. Officers can instantly see which stage failed, how long each step took, which OSINT tools were invoked, and what governance checks ran.

Beyond individual case troubleshooting, aggregated diagnostics reveal systemic patterns: a data source that times out 40% of the time, a document type that consistently fails conversion, or an agent that produces low-quality results for a specific country. The automated failure classifier categorizes negative feedback into root causes without LLM calls, enabling proactive infrastructure and configuration fixes.

Architecture

Pipeline Stages

The @diagnostic_stage decorator automatically records timing, status, and details for each stage. All 12 stages map to the investigation pipeline:

SeqStage NameTemporal ActivityDetails Captured
1portal_uploadDocument upload signalFile count, total size
2doc_processingprocess_documentsDocling conversion results
3doc_validationvalidate_documentsRequired vs. uploaded docs
4osint_investigationrun_osint_investigationAgent results, sources consulted
5mcc_classificationclassify_mccMCC code, confidence
6risk_assessmentassess_riskRisk score, red flags
7task_generationgenerate_follow_up_tasksTask count, categories
8confidence_scoringscore_confidence4-dimension scores
9red_flag_evaluationevaluate_red_flagsFlags triggered, actions
10synthesissynthesize_reportReport length, key findings
11automation_tierassign_automation_tierTier assigned, reason
12officer_reviewOfficer decision signalDecision, follow-up tasks

Each stage record includes: status (success/failed), duration_ms, details (JSONB), error_type (on failure), and parent_stage_id (for sub-stage nesting).

Failure Taxonomy

The FailureClassifier is a deterministic, rule-based engine (no LLM calls) that classifies negative feedback (rating 1-2) into root causes. Priority-ordered evaluation -- first match wins:

PriorityRoot CauseSeverityTriggerSuggested Action
1llm_hallucinationhighOfficer selects "hallucination" categoryReview evidence bundles for ungrounded claims
1extraction_failurehighOfficer selects "wrong_entity" categoryCheck entity extraction and registration number matching
1investigation_incompletemediumOfficer selects "missing_source" or "incomplete"Review template requirements vs. sources consulted
1data_stalenessmediumOfficer selects "outdated_data" categoryCheck cache TTLs and data freshness policies
2document_unreadablemediumdoc_processing stage failedCheck document format; verify Docling compatibility
2missing_documentmediumdoc_validation stage failedVerify required documents list in template
3source_timeoutlowTool invocation failed with timeout errorRetry investigation; check source response times
3source_unavailablelowTool invocation failed with connection/HTTP errorCheck endpoint availability and API credentials
4investigation_incompletemediumFewer than 3 OSINT sources succeededReview agent configuration and source availability
5unknownlowNo rule matchedManual review required

Session Reconstruction

The reconstruction endpoint (GET /api/diagnostics/{case_id}/iterations/{iteration}, in app/api/diagnostics.py) joins across all 6 telemetry tables to produce a complete investigation session view:

TableKey ColumnsJoin Key
pipeline_stagesstage, sequence, status, duration_ms, details, error_typecase_id (UUID) + iteration
tool_invocationsagent_name, tool_name, cost_category, duration_ms, success, cost_eurcase_id (VARCHAR) + iteration
audit_eventsevent_type, detailscase_id (VARCHAR), case-level
governance_eventsevent_type, mechanism, agent_name, approved, violationscase_id (VARCHAR) + iteration
evoi_decisionsstep_number, decision_type, candidate_agent, evoi_value, decisioncase_id (VARCHAR) + iteration
investigation_feedbackofficer_id, rating, categories, root_cause, severitycase_id (UUID) + iteration

The response includes total_duration_ms computed as the sum of all stage durations.

Integration Points

  • Pillar 1 (Confidence Scoring) -- Stage 8 (confidence_scoring) records the 4-dimension confidence breakdown. Low confidence scores correlate with negative feedback, enabling calibration via CalibrationService.
  • Pillar 3 (Cross-Case Patterns) -- Aggregated failure patterns from GET /stats feed into PatternEngine to detect systemic issues (e.g., a source failing across multiple cases).
  • Pillar 4 (Supervised Autonomy) -- Stage 11 (automation_tier) records the tier assignment. Negative feedback on auto-processed cases triggers tier downgrade via rolling window analysis.
  • Audit compliance -- All telemetry tables carry tenant_id and created_at columns for retention compliance. The reconstruction read path scopes queries via get_tenant_session(tenant_id). Note: the stage and feedback write paths (record_stage, feedback recording) currently open get_session(), so current_setting('app.current_tenant') resolves to the demo tenant default — telemetry rows are not yet correctly tagged with the case's own tenant. This shares the same root cause as the workflow audit-event tenant gap (see EU AI Act Compliance → Known Gaps).

Configuration

SettingTypeDefaultDescription
diagnostics_enabledbooltrueMaster feature flag. When false, all recording and API endpoints are disabled.
diagnostics_auto_classifybooltrueAutomatically run FailureClassifier when an officer submits feedback with rating 1-2.
diagnostics_feedback_promptbooltruePrompt officers to provide quality feedback after reviewing investigation results.

All three flags are set via environment variables or .env file, managed by pydantic-settings in app/config.py.

Planned: Context-Enriched Support Tickets

The session diagnostics system captures the richest investigation context in the platform — 12 pipeline stages, tool invocations, governance checks, EVOI decisions, and officer feedback, all joinable into a single reconstruction. This context is the foundation for a support ticket system where issues are pre-investigated before a support officer ever sees them.

The Vision

When an officer encounters a problem — a hallucinated finding, a missing data source, a failed document conversion — they should be able to create a support ticket directly from the case interface. The ticket automatically inherits the full diagnostic context for that iteration:

Why This Matters

Traditional support systems require the user to describe the problem, the support engineer to reproduce it, and both to spend time on context that the system already has. With session diagnostics, the platform knows:

  • What failed — which pipeline stage, which tool invocation, which data source
  • Why it likely failed — the FailureClassifier's deterministic root cause analysis
  • What the officer saw — the investigation results, confidence scores, and red flags at the time of the issue
  • What the system was doing — EVOI decisions, governance checks, agent execution order

An LLM pre-investigation agent can analyze this context and produce a structured summary: "The adverse media agent timed out after 45s on the BrightData source, causing incomplete OSINT results. This matches 3 similar incidents in the past 7 days, all involving Belgian entities with gazette lookups. Suggested action: check BrightData API status and increase timeout from 30s to 60s for gazette queries."

The support officer sees a pre-analyzed ticket with context, root cause hypothesis, and suggested remediation — not a blank form with "investigation results were wrong."

What Exists Today

  • Session reconstruction — full 6-table JOIN producing complete iteration context
  • Auto-classification — deterministic root cause analysis on negative feedback (rating 1-2)
  • Alert generation — significant classifications create DIAGNOSTIC_FINDING pattern alerts
  • Officer feedback UI — star rating + category chips + free-text comment

What Remains to Build

  • Ticket creation UI — button in case detail view to create a support ticket with pre-attached context
  • Support ticket data modelsupport_tickets table with priority, assignee, SLA tracking, resolution workflow
  • LLM pre-investigation agent — PydanticAI agent that analyzes the diagnostic reconstruction and produces structured root cause analysis with remediation suggestions
  • Support dashboard — queue view for support officers with pre-analyzed tickets, filtered by severity and root cause category
  • Cross-ticket pattern detection — aggregate analysis of support tickets to surface systemic infrastructure issues (e.g., "BrightData timeout rate increased 300% this week")

Design Principles

  • Guard-and-swallow -- Diagnostic failures never break application code. Every recording function catches exceptions at the boundary and logs at DEBUG level.
  • Fire-and-forget -- Stage recording happens in the finally block of the @diagnostic_stage decorator, ensuring timing is captured even on failure.
  • No LLM dependency -- The FailureClassifier is purely rule-based, avoiding cost and latency for a system-health concern. The planned LLM pre-investigation runs asynchronously on the ticket, not in the investigation pipeline.
  • Cumulative telemetry -- Reconstruction joins across tables from different subsystems (Pillar 3.5 tool audit, Pillar 4 governance, EVOI) without requiring those systems to know about diagnostics.