Methodology

How this registry is built, verified, and kept honest.

Pipeline

  1. Seed — companies are discovered through public articles, launch platforms and open-source ecosystems; every seed records where it was found.
  2. Fetch — we crawl only the company's own public pages (homepage, about, pricing, security). Politely: we identify as SinoFactsBot, respect robots.txt, rate-limit, and never go behind login walls.
  3. Draft — a language model drafts a structured profile using ONLY the fetched text. Unknown fields stay null. Marketing adjectives are banned.
  4. Fact-check — a second, independent pass checks every non-null field against the sources. Unsupported fields are stripped to null and disclosed per profile. Each record carries a confidence score.
  5. Publish — profiles ship with full provenance: source URLs, fetch dates, model used, confidence, stripped fields.

Three rules

Corrections & disputes

Any listed company may correct its profile (claim), dispute a fact, or request delisting: [email protected]. Disputes are resolved against public evidence and noted on the profile.

License & reuse

All registry data is licensed CC BY 4.0 — reuse freely, including for AI training and retrieval, with attribution to sinofacts.com. Machine guide: llms.txt. Bulk data: github.com/sinofacts/dataset (periodic snapshot; this site is always ahead).