Skip to main content

Investigation Cost Monitoring

Per-agent, per-case, per-tenant cost tracking with automatic LLM pricing sync and an admin dashboard -- making every investigation's cost fully transparent and auditable.

Business Value

Compliance investigations consume LLM tokens (PydanticAI agents), external API credits (BrightData scraping, Tavily search, NorthData), and compute resources. Without cost visibility, pricing decisions are guesswork -- you cannot optimize what you cannot measure.

Cost monitoring solves this by recording the exact token consumption and external API calls for every agent in every investigation. Admins see total cost per case, cost trends over time, and per-provider balance depletion. This enables:

  • Pricing accuracy: know the true cost of each investigation to set customer pricing
  • Budget control: monitor BrightData/Tavily credit balances with automatic refresh
  • Optimization signals: identify which agents or countries cost the most
  • EU AI Act compliance: token-level audit trail for every AI decision (Art. 12 logging)

Architecture

Data Model

agent_executions (enriched columns)

Four columns added to the existing agent_executions table:

ColumnTypeDescription
input_tokensINTEGERPrompt tokens consumed
output_tokensINTEGERCompletion tokens generated
cost_eurNUMERIC(10,6)Total cost in EUR
cost_breakdownJSONBDetailed breakdown (LLM + API costs)

cost_pricing

Unit prices for LLM models and external APIs, synced from LiteLLM daily:

ColumnTypeDescription
service_nameVARCHAR(100)e.g., openai/gpt-4.1-mini, brightdata:linkedin
input_price_eurNUMERIC(12,8)Price per 1M input tokens or per request
output_price_eurNUMERIC(12,8)Price per 1M output tokens (NULL for APIs)
unitVARCHAR(30)per_1m_tokens or per_request
sourceVARCHAR(20)litellm (auto-synced) or manual (admin override)
tenant_idUUIDNULL = global, non-NULL = tenant-specific override

RLS enforced. Partial unique index uq_cost_pricing_global handles NULL tenant_id uniqueness.

account_balances

External API provider credit balances, refreshed every 6 hours:

ColumnTypeDescription
providerVARCHAR(50)brightdata or tavily
balanceNUMERIC(12,4)Remaining credit balance
pendingNUMERIC(12,4)Pending charges
checked_atTIMESTAMPTZLast balance check timestamp

Token Flow Pipeline

The token tracking pipeline bridges PydanticAI's RunResult.usage() to the cost database through a ContextVar:

  1. Agent runner (e.g., person_validation_agent.py) calls result.usage() after PydanticAI execution
  2. ContextVar bridge (agent_token_context.py) stores usage via set_agent_usage()
  3. _safe_* wrapper in osint_agent.py reads via get_and_clear_agent_usage(), passes input_tokens/output_tokens to update_status()
  4. update_status() in agent_progress_service.py calls calculate_cost() on terminal statuses
  5. calculate_cost() looks up pricing (5-min cached), computes llm_eur + api_eur = total_eur
  6. Result persisted to agent_executions.cost_eur and cost_breakdown (JSONB)

Three agents are wired: person_validation, adverse_media, social_intelligence. Non-LLM wrappers (PEPPOL, WHOIS, website scrape) have no token tracking since they don't invoke PydanticAI agents.

Cost Calculation

# LLM cost: tokens / 1M * price_per_1M
llm_eur = (input_tokens / 1_000_000 * input_price) + (output_tokens / 1_000_000 * output_price)

# API cost: count * per_request_price
api_eur = sum(count * price for service, count in external_api_calls.items())

# Total
total_eur = llm_eur + api_eur

Prices are stored in EUR. The pricing cache (_pricing_cache) uses a 5-minute TTL with time.monotonic() to avoid clock skew issues.

Pricing Sync

The background sync loop runs in the FastAPI lifespan:

CycleIntervalWhat
Initial10s after startupLiteLLM pricing sync
RecurringEvery 6 hoursBrightData balance, Tavily balance, LiteLLM pricing

LiteLLM sync fetches the model_prices_and_context_window.json from GitHub, parses per-token costs into per-1M-token prices, and upserts with ON CONFLICT ... WHERE tenant_id IS NULL. Manual admin overrides (source = 'manual') are never overwritten.

Balance sync fetches credit balances from BrightData (GET /customer/balance) and Tavily (GET /usage) APIs, upserts to account_balances table.

Admin API

Eight endpoints under /api/admin/costs/ (router prefix /admin/costs mounted at /api), protected by require_role("super_admin", "tenant_admin"):

MethodPathDescription
GET/agent-summaryPer-agent cost summary over a period (24h/7d/30d)
GET/case/{workflow_id}Per-case cost breakdown by agent
GET/summaryAggregate costs with day/country filters
GET/trendsDaily/weekly cost trends for charting
GET/pricingCurrent pricing table
PUT/pricing/{id}Admin price override (invalidates cache)
GET/balancesBrightData + Tavily credit balances
POST/sync-pricingForce immediate LiteLLM sync

Admin Dashboard

Three-tab React page at /admin/costs:

  • Overview: stat cards (total cost, avg/case, case count, tokens), recharts AreaChart trend, provider balance cards, top cases table
  • Case Costs: expandable table with per-agent cost breakdown
  • Pricing: editable pricing table with source badges (litellm=auto, manual=override), "Sync from LiteLLM" button

Seed Data

Migration 046 seeds 8 pricing entries (EUR, manual source):

ServiceInput PriceOutput PriceUnit
openai/gpt-5.22.309.20per_1m_tokens
openai/gpt-4.1-mini0.281.10per_1m_tokens
anthropic/claude-sonnet-4-52.7613.80per_1m_tokens
brightdata:scrape0.01--per_request
brightdata:linkedin0.02--per_request
brightdata:social0.01--per_request
tavily:search0.005--per_request
northdata:api0.01--per_request

These are overwritten by LiteLLM sync for LLM models but preserved for API-based services.

Known Limitations (PoC)

  • USD/EUR conversion: LiteLLM prices are stored as-is (USD). For PoC, the ~8% difference is acceptable. Production should apply a configurable FX rate at sync time.
  • API cost attribution: External API call counts are tracked at case level, not agent level. The aggregate per-case cost is accurate, but per-agent API cost breakdown may include calls from other agents in the same case.
  • Token tracking coverage: Only 3 of 13 agents currently pass tokens (person_validation, adverse_media, social_intelligence). Remaining agents need wiring as they're upgraded to PydanticAI.