Author: Chrisgone Adede, Senior Researcher, Data Science
Community health workers (CHWs) are increasingly recognized for their contributions in increasing access to health care services and healthcare seeking-behaviors, as well as improving clinical outcomes given their proximity to the communities they serve. Digital tools have been noted to provide the opportunity to enhance the coverage and quality of health services, particularly those essential services provided by CHWs.
Despite research showing that digitally supported CHWs accelerate the delivery of healthcare, there is continued mistrust in the quality of data collected by CHWs. This mistrust of community-led health data by decision-makers suggests that the potential of community health data to bring about quality improvements will not be realized.
In the experience of Medic, this mistrust has curtailed initiatives like precision community health that leverage data and analytics to deliver the right intervention at the right time to the right population. Data is viewed to be of good quality when it can serve the intended purpose and support decision-making. In the context of precision community health, data can be viewed as good quality when stakeholders can express confidence in decisions made out of evidence realized from the data. Data is therefore considered to be of good quality when it is up to date and correctly and comprehensively measures the intended quantity while not contradicting alternative sources.
Achieving data quality at the platform level
Good quality data is key to Medic in its mission to advance good health, human flourishing, and equitable care for and with the hardest-to-reach communities through digital tools that help health workers deliver just, quality care to their neighbors and community. Medic aims, through the human-centered design (HCD) Medic Labs, to incubate long-term breakthrough ideas in data science and precision community health. As technical stewards of the Community Health Toolkit (CHT), Medic works with our partners to improve data quality at the platform level and therefore across all CHT deployments.

Data from digital health platforms finds its way to national health information systems (HIS) like the District Health Information Software (DHIS2). Driving data quality initiatives at the platform level will not only ensure quality data gets to the national HISs, but offer other in-platform benefits including building shared approaches within the community of practice, increasing efficiency through non-sporadic quality checks, reducing redundancy of technical efforts, and realizing extensible data integrity tools.
Partnerships for tooling and scaling
Communities of practice like the CHT community are naturally collaborative in their approaches to solving common problems. Developing a data quality toolkit requires partnerships in two aspects: the development of the toolkits and scaling the tools to multiple deployments. Our partners possess domain understanding, collaborate with multiple stakeholders in the health ecosystem, and fit in the technology choices of other partners. Our technical partner, DataKind, leads the tooling effort while testing will be done in collaboration with our implementing partners across different countries. With partners, we expect to scale with data quality embedded as we develop a platform-wide culture of data trust.
Accomplishments and into the future
In 2021, in collaboration with DataKind, Medic developed 160 tests against a single CHT deployment as part of an exploratory analysis that sought to identify inconsistent or problematic data (IoP) from a digital CHW health system and recommended a platform-wide tool-agnostic approach to data quality in community health. Subsequent work in Q4 2021 extended the exploratory data analysis (EDA) to better understand types of data quality issues and their pervasiveness using data from an additional deployment of the CHT. Even though the EDA covered the period from January to August 2021, the data from the deployment had more than eight million home visits conducted by 424 CHWs, reaching 370,000 households since 2017. The output of this subsequent exploratory analysis mirrored those of the initial work. Some of the IoPs identified were:
- Incomplete data in key fields like the CHW-filled in “date of visit” missing for 90% of home visits.
- Long lags even in the magnitude of years were observed between the “date of visit” and the system-generated “reported date” for some home visits (Figure 1).
- Pregnancies were detected within days after the last menstrual period (LMP) or on the same day for some pregnancies.
- Multiple records of suspected COVID-19 cases for the same patient, some recorded on the same day.
Despite the similarities, an interesting difference was the occurrence of instrument calibration errors that only manifested before 2021 in the subsequent analysis, while the previous analysis reported instrument calibration errors across the years.

The EDA was augmented by an HCD driven and Medic-led User Experience (UX) research that saw the users outline their pain points and eventually assign priorities to the identified IoP scenarios. The synthesis of the extended EDA and the UX research has provided Medic with the opportunity to execute the ongoing development of an easy to configure generic data quality toolkit that automates the execution of data quality checks and their reporting across multiple deployments of the CHT and potentially beyond the CHT. Medic and the CHT community are advancing work on data quality to support innovative ways to better healthcare including precision community health.
Platform-level data trust will drive confidence in the use of the data collected by CHWs for decision-making. With quality data at the platform level, our precision community health initiatives will support CHWs to promptly and proactively deliver precisely targeted, just, and quality care where it is needed most.