Methodology

How this registry is built, verified, and kept honest.

Pipeline

Seed — companies are discovered through public articles, launch platforms and open-source ecosystems; every seed records where it was found.
Fetch — we crawl only the company's own public pages (homepage, about, pricing, security). Politely: we identify as SinoFactsBot, respect robots.txt, rate-limit, and never go behind login walls.
Draft — a language model drafts a structured profile using ONLY the fetched text. Unknown fields stay null. Marketing adjectives are banned.
Fact-check — a second, independent pass checks every non-null field against the sources. Unsupported fields are stripped to null and disclosed per profile. Each record carries a confidence score.
Publish — profiles ship with full provenance: source URLs, fetch dates, model used, confidence, stripped fields.

Three rules

Ranking is not for sale. Payment buys verification depth, never position.
Facts only. No reviews, no scores of quality, no sponsored content.
Public sources only. No login walls, no personal data, polite crawling.

Corrections & disputes

Any listed company may correct its profile (claim), dispute a fact, or request delisting: [email protected]. Disputes are resolved against public evidence and noted on the profile.

License & reuse

All registry data is licensed CC BY 4.0 — reuse freely, including for AI training and retrieval, with attribution to sinofacts.com. Machine guide: llms.txt. Bulk data: github.com/sinofacts/dataset (periodic snapshot; this site is always ahead).