Strengthening Data Management Practices
NIS Cambodia Case Study
NOTE: Please see the Community blog for an update on this project as of May 2019.
The Sustainable Development Goals (SDGs) were adopted in the midst of a data revolution – an explosion not only in the volume of data available, but also of demand for access to it. Taken together, the evolving data ecosystem and the commitments to monitor Agenda 2030’s ambitious goals are increasing the pressure on countries – to not only produce more and better data, but also share it more effectively.
To meet these demands, national statistics offices need to become more efficient and self-reliant, while expanding statistical capacity in ways that challenge the traditional understanding of the concept. Beyond producing data, measuring the effectiveness of policies and sharing data are becoming increasingly important. To ensure that data users get what they need, to inform a public increasingly poised to become a powerful force for development – and to fulfil SDGs reporting requirements, countries need new and diversified ways of communicating statistical information.
Currently, however, many developing countries lack fully functional platforms for data dissemination and reporting – much less ones that are up to the task of reporting on SDGs indicators. None of the solutions that have emerged so far are sustainable over the longer term for a large number of countries. DevInfo, the data dissemination platform developed for the Millennium Development Goals (MDGs), is used in about 120 countries, is but a partial solution that still struggles with long-term sustainability, and with it’s impending sunset at the end of 2018, a alternative for these ~120 countries is needed to be found.
There are an array of other tools, data portals, and platforms emerging, but most fall short in meeting the needs of many countries. This was further highlighted by the study on national data portals published in 2016 by PARIS21 that concluded that while there have been many well-intentioned efforts to make these solutions (data portals) available to countries, the outcomes are rather mixed, particularly in the most aid-dependent countries.
Better data portals, moreover, cannot by themselves solve the problem. Unfortunately, few existing dissemination solutions touch upon the preceding phases of statistical processes, from design to collection through to processing. Indeed, in many countries, different departments or organisations are responsible for these various stages, each working in isolation and without coordination. Data dissemination that stems from such a fragmented approach does not serve data users well.
While information technologies have great potential to improve dissemination, simply applying digital tools is not enough – if the underlying processes that produce and manage the data remain inefficient and uncoordinated. The new environment requires reinvigorating commitments to bring about real change in statistical capacity and our roles in promoting the shifting paradigm. Among them are the need for greater harmonisation of international partners, integration of country statistical processes and the assumption of country fiscal responsibility to search for cost-effective solutions and self-reliance in a sustaining regional setting. Statistical capacity development in this backdrop, demands radical rethinking. It must bring about transformation that is generated, guided and sustained over time from within countries. Harnessing the Data Revolution well, National Statistics Offices are poised to gain following the Capacity Development 4.0 framework, where overarching driving forces are: digital, integration and partnerships.
Therefore, a strategic approach, which develops capacities in developing countries in a sustainable manner, is what is required in order for countries to tread a path of maximising the ‘return-on-investments’ on technology solutions applied to statistical processes.
Piloting the new approach
In order to test out a new approach to tackle and reverse the current situation, a pilot project was launched at the beginning of 2018. The pilot implemented some new ideas that the SIS-CC has been experimenting with to meet the needs of this shift in paradigm to increase sustainability and ensure an integrated approach to initially meet an NSOs SDGs reporting requirements, but enable to build upon this for a complete and consistent approach to an end-to-end data lifecycle management. It also started to address the currently fragmented approach to data dissemination and create a more coordinated effort and greater harmonisation, by bringing several players together including technology provider, capacity development experts, regional/international partners, and Country/NSO representatives.
This project was led by a main team consisting of resources from the Partnership in Statistics for Development in the 21st Century (PARIS21), the Statistical Information System Collaboration Community (SIS-CC) with operational activities delegated to the Statistics and Data Directorate (SDD) of the Organisation for Economic Co-operation and Development (OECD), and the United Nations Children’s Fund (UNICEF), bringing technical expertise and existing in-country support related to DevInfo. In addition, and to ensure a greater coordinated effort by international partners, United Nations Statistical Division (UNSD) also participated to provide support in the areas of SDGs reporting, SDMX training, and discuss the ongoing support for the piloted countries in the area of data modelling.
“SDMX facilitates for a standardised full data lifecycle, and guides data modelling to ensure a coherent output suitable for consumption by data portals and visualisations.”
The pilot project had three key objectives:
- Assessment and planning to identify and define the target product vision, the level of capacity development required for data preparation, and the target host environment.
- Training and capacity development on relevant tools and data modelling; and on using the existing DevInfo data for loading into .Stat Suite.
- Solution delivery to put in place, the new .Stat Data Explorer for NIS Cambodia to use.
Assessment and Planning phase was the first and most important step in identifying and defining the target product vision for the pilot, the level of capacity development required for data preparation, and the target host environment.
This covered four key areas:
- Assessment of the organisations current resource capacity to identify the gaps determining the capability to meet the target environment and improved data workflows;
- Assessment of current ICT infrastructure to define the target environment and support capabilities;
- Assessment of the current product landscape (DevInfo deployment and integration) including identifying existing constraints, complexities such as integration with other systems, and gaps in fulfilling the complete end-to-end data lifecycle;
- Assessment of the current data processes to prepare, map, and load data into the DevInfo database, with a mapping to the well-established statistical models including the GSBPM, as well as the level of compliance to internationally agreed standards namely SDMX.
Only upon completion of the Assessment a more detailed plan could be created, confirming the different components that were part of the final delivery.
“GSBPM provides the conceptual framework for Full Data Lifecycle.”
What was the rationale for the pilot and why Cambodia? As already mentioned, currently, many developing countries lack fully functional platforms for data dissemination and reporting. None of the solution that have emerged so far are comprehensive (technically) and sustainable (financially) over the longer term for a large number of countries.
Cambodia was chosen, as it is one of the current DevInfo users. It showed interest in moving away from DevInfo. It also wanted to minimise manual data entry in online data dissemination efforts. It agreed for the pilot, when requested jointly by UNICEF, OECD and PARIS21.
The project took shape amidst the following three concurrent developments:
- DevInfo, the data dissemination platform developed for the MDGs (supported by UNICEF), and used in about 120 countries, will no longer be supported at the end of 2018, an alternative for these countries was needed to be found.
- SIS-CC (under the guidance of OECD/SDD) is developing an SDMX based, modular and open source data dissemination platform that covers the full data lifecycle had so far not been tested in a low capacity situation.
- PARIS21, in its study on national data portals published in 2016, noted that there are an array of data portals and platforms emerging, but most fall short in meeting the real needs of many NSOs and advocated for a comprehensive assessment of data flows to identify the challenges and then propose a solution, which is fit for purpose.
Over the course of the project, the team worked with the NIS resources to assess the existing data flow mechanisms with which, the NIS interfaces and which results into dissemination of official statistics, along with infrastructure and capacity needs.
With the assistance from the subject-matter experts, the mission team modelled 36 indicators from the education domain, and 13 indicators from the demography domain. As of now, around 50,000 observation values have been migrated to the .Stat Data Explorer, hosted in a pilot environment running in the cloud.
A key objective of the project was to identify a suitable replacement for CamInfo (DevInfo) and facilitate for disseminating existing CamInfo data into this new solution. Following a second mission to NIS Cambodia, this objective was met, with all CamInfo data successfully extracted in spreadsheet format ready for cleaning and modelling. Together with the NIS, the team was able to identify and model all CamInfo demography indicators into a common structure, using standard code lists (Areas, Age groups, Units, etc.), the demography data was cleaned and normalised, then the data structures, code lists, and data were uploaded to .Stat using the Data Lifecycle Manager (DLM). A key success factor was the use of new intuitive tooling: Open Refine (Data Cleaning in addition to a whiteboard), DSD Constructor (Creation of Data Structure Definitions and management of codelist etc), and the .Stat DLM (Data and structure loading and dissemination).
It is safe to say that data management was put in the hands of subject matter experts with staff suitable trained.
The project exposed a number of weaknesses in current global approaches and highlighted areas that must be addressed if National Statistical Offices are to become sustainable in the long term.
- Strengthening data management capacity is foundational for proper indicator monitoring and reporting and an efficient data management pipeline.
- National indicators are oriented toward national plans, which do not automatically insure availability of SDG data.
- SDGs should be seen as a by product of a wider capacity development effort and data modelling exercise and not be the main focus.
- The NSO must own the implementation plan by leveraging existing networks and partners who are already working in the country, and understand how the National Statistical System works.
- Working with Subject Matter Experts is key to mainstream the sustained practices of data modelling and not just hand over the data dissemination responsibility to IT department who do not necessarily have the expertise and understanding of the data.
- Although the technical solution is a key element to support a much improved end-to-end data flow, the majority of the effort must focus on capacity development.
- Tools are only one part of the PPP equation: People, processes, products.
- A sustained period of continual engagement is needed if any solution, and a standardised data modelling (SDMX) approach is to take hold and provide a new solution to indicator management.
- A cloud solution for dissemination is only part of the solution but there is a need also for an internal data management infrastructure to support local artefact and data management, with versioning and security.
“Standards is fundamental to develop the conceptual framework to build a shared knowledge by bringing together a community of practitioners and statistical experts.”
As part of the project deliverables, a methodology was developed to assist successful conduction of in-country needs assessment. The key objectives being: a sound understanding of the institutional structures and key processes evolved around data production and dissemination in the NSO; an in-depth study of the IT strategy, tools and capacities of the NSO to ensure that the solution will build on those processes; the technical specifications for customised solution, in order to fully meet the needs of the NSO; and a high-level project plan, describing timelines and actions required to implement the solution.
This methodology, named the Data Flow Assessment Framework (DFAF), and taking the form of a document, contains the list of tools and guidance necessary to carry out a needs assessment. It provides the methodological steps and tools to produce a sound needs assessment in preparation for future implementations of an integrated solution such as .Stat Suite in a given country and include a suggested outline for the needs assessment report and template for the project document.
Jonathan Challener, Partnerships and Community Manager, OECD
Rajiv Ranjan, Technical Programme Advisor, PARIS21
Yves Jaques, Project Manager, UNICEF
- Making Data Portals work for SDGs: A view on deployment, design and technology, Partnership in Statistics for Development in the 21st Century (PARIS21), Discussion Paper No. 8, April 2016. (https://www.paris21.org/sites/default/files/Paper_on_Data_Portals wcover_WEB.pdf)
- Capacity Development 4.0 framework (http://paris21.org/capacity-development-40)
- Data Flow Assessment Framework (DFAF) is to guide the process and inform decisions going forward.