This month, an OECD delegation travelled to Geneva for the UNECE Workshop on Generative AI & Official Statistics (12-14 May 2025). Through discussions, case studies, and interactive sessions, the event aimed to provide a platform for professionals and stakeholders to exchange knowledge, showcase practical applications and address governance, ethical and infrastructural challenges integrating GenAI in official statistics.
For the Statistical Information System Collaboration Community (SIS-CC) the meeting was particularly timely: many of the pilots on show build directly on the SDMX standard and on the open-source tooling that SIS-CC has long maintained.
The Role of Generative AI in Generating Official Statistics
To kick off the workshop, OECD Chief Statistician Steve MacFeely delivered a compelling keynote, arguing that while AI, like data, is now ubiquitous, statisticians often feel like they’re struggling to keep pace with innovation. Three take-aways stood out:
- From code to co-pilot: GenAI is fast becoming an essential part of an NSO’s toolbox, a silent partner able to generate, document, translate and debug code so that statisticians can focus on insight rather than syntax.
- Radical availability, responsibly governed: AI can widen access to data, but only if we build a democratically accountable information architecture that counters echo-chambers and keeps provenance, quality and confidentiality front-of-mind.
- Guarding against a “data winter”: In some countries, we are now witnessing the systematic removal of public and official statistics, long understood as open, public goods. These data, and their metadata, could be bedrocks for AI training, so long as they are made to be machine readable, with permissive licensing and cleat terms of use.
Experimenting across the data lifecycle
In a later session, Eric Anvar and François Fonteneau translated those strategic messages into practice, outlining how the OECD is experimenting with AI across the entire data lifecycle under its Smart Data Strategy. Their intervention highlighted how our shared SDMX layer is enabling GenAI experimentation from data collection to dissemination:
- Curation & quality – AI-assisted anomaly detection is trimming manual checks and freeing statisticians to investigate genuine outliers.
- Metadata enrichment – Large language models (LLMs) classify and cluster policy texts, survey answers and administrative records, producing richer SDMX-aligned metadata that improve discoverability and re-use.
- Natural-language access (“SDMX + AI”) – By coupling LLMs with the semantics already embedded in SDMX structures, analysts – and eventually the public – can interrogate complex datasets in plain language, a step towards radical availability.
- Pair-programming at scale – OECD developers use GenAI to translate, document and test code, but always behind safeguards for privacy, intellectual-property and reproducibility.
- Dissemination experiments – New pilots, run jointly with the SIS-CC, generate narrative summaries and custom visualisations on-the-fly, and specialised resources (for example in health) test how GenAI can tailor messages to domain users.
Underpinning all of this is an exploratory, bottom-up governance model: rather than wait for a monolithic framework, teams share lightweight guidelines, pool test results and iterate quickly – an approach that mirrors the SIS-CC ethos of collaborative, open development.
We invite all SIS-CC members to join this conversation, test the prototypes and share feedback. Together we can ensure that generative AI amplifies not just the value of official statistics, but also the values—openness, quality, trust—that underpin our work.