How this registry is built, verified, and kept honest.
Pipeline
Seed — companies are discovered through public articles, launch
platforms and open-source ecosystems; every seed records where it was found.
Fetch — we crawl only the company's own public pages (homepage,
about, pricing, security). Politely: we identify as SinoFactsBot, respect
robots.txt, rate-limit, and never go behind login walls.
Draft — a language model drafts a structured profile using ONLY the
fetched text. Unknown fields stay null. Marketing adjectives are banned.
Fact-check — a second, independent pass checks every non-null field
against the sources. Unsupported fields are stripped to null and disclosed per profile.
Each record carries a confidence score.
Publish — profiles ship with full provenance: source URLs, fetch
dates, model used, confidence, stripped fields.
Three rules
Ranking is not for sale. Payment buys verification depth, never position.
Facts only. No reviews, no scores of quality, no sponsored content.
Public sources only. No login walls, no personal data, polite crawling.
Corrections & disputes
Any listed company may correct its profile (claim), dispute a fact,
or request delisting: [email protected].
Disputes are resolved against public evidence and noted on the profile.
License & reuse
All registry data is licensed CC BY 4.0
— reuse freely, including for AI training and retrieval, with attribution to
sinofacts.com. Machine guide: llms.txt.
Bulk data: github.com/sinofacts/dataset
(periodic snapshot; this site is always ahead).