Skip to main content

ADR-0012: Hybrid Scraping Tool Selection

Date2026-01-15
StatusImplemented
DecidersAdrian Birlogeanu

Context

Belgian OSINT investigation requires data from multiple sources (KBO, Gazette, NBB, Notary, Inhoudingsplicht), each with different access patterns.

Decision

Use hybrid scraping strategy -- choose the best tool per data source rather than one-size-fits-all.

Tool Selection

SourceToolRationale
KBO (Crossroads Bank)Custom scraperStructured HTML, stable format
Belgian Gazettecrawl4aiDynamic content, JS rendering
NBB (National Bank)REST APIOfficial API available
Notary directoryBrightData MCPAnti-bot protection
Inhoudingsplichtcrawl4ai + PEPPOLSemi-structured government pages

Consequences

  • Multiple scraping dependencies to maintain
  • Per-source error handling required
  • Mock mode for each source in development