Skip to main content

ADR-0079 — Estonian company financials via RIK e-Äriregister open data

Status: Accepted Date: 2026-06-28 Deciders: Compliance engineering (with the OB-Holding reachability investigation oracle) Context source: docs/research/2026-06-28-ob-holding-reachability-investigation.md

Context

OB Holding 1 OÜ's real FY2024 financials (turnover €42,538,000, net profit €24,601,000, assets €57.45m, equity €42.42m) read as "not assessed" in our output. This is a deliberate non-fetch, not a data absence:

  • ee_ariregister_service.py:338-341 hard-codes financials_summary="Financial data not available via Äriregister public API (requires e-äriregister portal access)." on every found-company path. We call only the autocomplete + /company/{reg}/json "card" endpoints (identity/persons only).
  • osint_agent.py:_fetch_country_financials (625-653) has per-country fetchers for NO/RO/SK/NL but no EE branch, so the synthesis fallback can't fill the gap. The country-agnostic NorthData fallback then returns little (NorthData is DACH-centric and excludes EE/LV/LT).

The figures are free and public. The 2026-06-28 investigation verified live (HTTP 200) the RIK avaandmed open-data annual-report element dump (4.2024_aruannete_elemendid…zip, 23.3MB, application/zip, last-modified 2026-06-09), which carries Revenue / Total assets / Equity / Operating-and-period profit per registrikood per year, CC-BY-4.0, no auth. This also addresses ADR-0073 R11 (financials-ingestion gap: a missing financial statement must read as an honest gap, never a benign "no financials").

The proven pattern already exists for other countries: a thin fetch_financials(reg) module (e.g. no_regnskapsregisteret.py) returning financial_utils.build_fhr(snapshots, source=…), wired into the agent's financial_health_report= field and consumed unchanged by the financial-analysis agent (ADR-0048).

Decision

Add Estonian financials as a real, sourced data channel; replace the hard-coded "not available" string with either sourced figures or an honest, structured gap.

  1. RIK avaandmed open-data fetcher (primary, FREE). New app/services/registries/ee_ariregister_financials.py:fetch_financials(reg_code) that:

    • ingests the per-year "report elements" open-data dumps into a local table keyed by registrikood + year (the dumps are 23MB ZIPs — this is an ingest + monthly-refresh job, not a per-request download), recording the dump's last-modified / as-of date;
    • ingests only the financial elements + registrikood from the "general info" dump — not natural-person director fields — so EE open-data ingestion stays outside the PII regime (app/pii/); CC-BY-4.0 attribution is met by the source= label;
    • builds the report_id → registrikood join, attributes only on exact registrikood match, and runs a corroboration/sanity check (magnitude cross-check against the NorthData/aggregator figures from #5) before presenting a figure as the subject's — a join-key error would surface a real-but-wrong company's numbers, the financial analog of the R9 name-collision (EU AI Act Art. 15 accuracy);
    • maps Revenue→revenue, Total assets→total_assets, Equity→equity, and the named net-profit et-gaap elementprofit_loss (pin the exact element with documented semantics — do not conflate operating profit with net profit; the verified OB Holding figure is net profit €24,601,000) per year;
    • returns build_fhr(snapshots, source='RIK e-Äriregister Open Data') carrying the as-of date. Mirror the proven no_regnskapsregisteret.py shape.
  2. Wire EE into the pipeline. Add an EE branch to osint_agent.py:_fetch_country_financials; replace the hard-coded financials_summary in ee_ariregister_service.py:run_ee_agent with the fetched FHR (or empty + gap). The FHR is consumed unchanged by financial_analysis_agent.run_financial_analysis.

  3. Remove the misleading hard-code (Wave 0). Delete the literal financials_summary='Financial data not available…' and the early-return blocks that also set it. Distinguish two gap shapes: an unsupported jurisdiction emits the ADR-0080 country_capability_gap(FINANCIAL_STATEMENTS, country); a supported jurisdiction with the datum absent (e.g. EE post-launch, FY not present as of the snapshot) emits an ADR-0073 R11 financials_ingestion_gap / ADR-0067 data-gap — not a capability gap (declaring "not assessed for Estonia" when the channel works would understate our coverage to a regulator). Mirror the CZ filings_only→upgrade pattern in cz_ares_service.py.

  4. Optional higher-fidelity tier (RIK SOAP/XBRL). A second function fetch_financials_soap calling arireg.majandusaastaAruanneteKirjed_v1 (after the …Loetelu_v1 list) parsing et-gaap XBRL <vaartus> line items — selected ahead of open data when credentials are configured. This requires a signed RIK agreement (organizational lead-time), gated behind new config.py fields (ariregister_xml_user/password/enabled). Open data (decision #1) is the no-contract fallback.

  5. NorthData financials parse (corroboration stopgap). NorthData already fetches OB Holding's page; extend extraction to parse the financial block it renders and map to build_fhr(source='NorthData') — lowest-friction corroboration since the HTTP path already exists. Aggregator-sourced figures are labelled as such (not the official register).

Consequences

  • The €42.54m FY2024 turnover (and prior years) surfaces from a free, official source, feeding the existing ratio/distress analysis (ADR-0048) instead of going_concern="insufficient_data".
  • ADR-0073 R11 satisfied: financials are now fetched where available, and a genuine absence reads as an audited gap (ADR-0080), never a benign "no financials."
  • Honest residuals (must remain gaps, never inferred): (a) a recent FY may be absent because the company has not filed or because our monthly snapshot is stale — we cannot observe filing behaviour, so the gap is surfaced as "not present as of <snapshot date>" (carrying the avaandmed last-modified date), never asserted as "not filed"; the per-case as-of snapshot persists into the evidence bundle (ADR-0021/0063), not just the live refreshed table; (b) the free open data carries key indicators only — note-level statements need the SOAP/XBRL tier (RIK-agreement-gated); (c) GLEIF/BRIS/OpenCorporates carry no EE financials (identity/existence only); (d) aggregators (Inforegister/Okredo/NorthData) are labelled aggregator-sourced unless fetched from the e-Äriregister filing itself.
  • New operational surface: the avaandmed ingest needs a scheduled refresh and storage. Budget as M/L, not a trivial fetch.

Addendum — lazy on-demand auto-fetch + parsing correction (2026-06-29)

Two updates from wiring this into the live flow and validating against the real dump.

1. Lazy on-demand ingest (supersedes "scheduled job only" for the default path). The original decision deferred ingestion to a scheduled bulk job and left fetch_financials returning {} until that job ran — which meant EE financials were inert in practice (every live case showed "not assessable"). We now add a lazy auto-fetch (ee_avaandmed_autofetch.py, default ee_avaandmed_autofetch_enabled=True): on the first EE financial lookup with no pre-ingested path, the module downloads the current avaandmed report files once into a local cache and stream-extracts the company. Design guards:

  • URL discovery, not hardcoding — the filenames carry a monthly snapshot suffix (…kuni_31052026…), so the live URLs are scraped from the open-data page each refresh; a pinned URL would break monthly.
  • Cache + TTL (ee_avaandmed_cache_dir, ee_avaandmed_cache_ttl_days=30), single-flight (asyncio.Lock so concurrent investigations download once), per-company memoisation, stream parsing of the 200–300MB CSVs (bounded memory, early-break on the report-id-sorted element file).
  • Fail-closed — discovery/download/parse failure → [] (honest gap, not cached, retries next time); a genuinely absent company caches [] (fast, honest on repeat). The pre-ingested ee_avaandmed_ingest_path still wins when set (scheduled bulk ingest remains supported for high-volume deployments). The scheduled job is now an optional optimisation, not a precondition.
  • Cost/latency trade: the first EE lookup pays a one-time ~40MB download + a multi-hundred-MB scan (cached after). Acceptable for this workload; a future SQLite index would make repeat per-company lookups instant.

2. Parsing correction — OB Holding's financials ARE in the open data. A mid- investigation grep '"<report_id>"' (quoted) wrongly returned zero and led to a brief, wrong conclusion that OB Holding (14975047) was a "PDF-only filer" absent from the elements dump. Cause: the report_id column is unquoted in data rows (only text fields are quoted), so the quoted grep never matched; the csv parser reads it correctly. Verified: report_id 3323168 carries the 20 FY2024 elements — Revenue €42,538,000, Assets €57,453,000, Equity €42,420,000, net profit €24,601,000 — matching Inforegister exactly. Lesson (encoded as a ground-truth test): validate structured data with the parser, never a grep on a quoted field. Also extended _ELEMENT_FIELD_MAP to the real balance-sheet labels (bare Varad/Assetstotal_assets), which the original idealised map missed.

3. Inforegister.ee fallback + cross-source corroboration. ee_inforegister_financials.py is a fallback (and second-source check) for the cases avaandmed can't serve — a company whose report content is not yet in the structured open-data elements (very recent or micro filings). It reads the same RIK figures that Inforegister republishes free + login-less as embedded chart JSON (<script type= "application/json"> blocks), parsed per-block by the block's own year header so a quarterly series can't be mistaken for the annual figure, and forecast columns ("2025*") are excluded. It is fallback-only (used by _fetch_country_financials when the official avaandmed path returns {}) and labelled aggregator-sourced (source="Inforegister.ee (aggregator)" + source_url), per consequence (d) — never presented as the primary registry. Honest residuals: equity is not in the chart JSON (omitted, not guessed); a slug mismatch 404s to an honest gap (no fuzzy fetch). Validated live: OB Holding FY2024 revenue €42,538,000 / net profit €24,601,000 / assets €57,453,000 — identical to avaandmed, giving an independent cross-source confirmation.

Alternatives considered

  • Commercial EE credit bureau (Creditinfo Eesti / Inforegister REST) as primary. Rejected as first move — contract + cost — when the official open data is free and verified. Kept as an optional config-gated channel (Wave 6) if the free path proves insufficient.
  • Per-request download+parse of the 23MB ZIP. Rejected — not viable per request; requires the ingest table + refresh job in decision #1.
  • Keep the hard-coded string. Rejected — it is actively misleading (asserts unavailability where free data exists) and violates ADR-0067 honesty.

References

ADR-0073 (R11 financials-ingestion gap), ADR-0048 (financial-analysis agent — unchanged consumer), ADR-0068 (country-capability registry), ADR-0080 (FINANCIAL_STATEMENTS capability signal — pairs with this), ADR-0067 (fail-closed honesty), ADR-0021 (evidence bundle) / ADR-0063 (append-only persistence — per-case as-of snapshot retention), ee_ariregister_service.py, no_regnskapsregisteret.py, financial_utils.py, research doc docs/research/2026-06-28-ob-holding-reachability-investigation.md.