ISB Blog

By Alekhya Chaparala, Research Assistant, MIHM

On a cool day last December, I visited the Kopaganj Community Center in Uttar Pradesh’s Mau district. Implementation of an MIHM costing study on family planning was about to begin, and I was in the field to finalize operations and brief local officials before data collection commenced the following week. My work largely involved traveling between our study districts and tracking down various district health officials. At any given moment these officials could be in their offices, on a site visit, or performing a surgery – which is how I ended up at the CHC in Kopaganj, looking for the Chief Medical Officer (CMO) of Mau. Kopaganj was not one of our originally selected study facilities. The facilities for the costing study had been selected based on service delivery data recorded online in the Uttar Pradesh Health Management Information System (UPHMIS) portal, and delivery of female sterilization had been a priority inclusion criterium.

We had originally selected the nearby Ratanpura CHC for our study, which had performed the most female sterilizations in the past year based on UPHMIS data. However, upon reaching Kopaganj, I quickly learned that this was not the case. I found the CMO sitting in a packed administrative office with a steady stream of staff flowing in and out. He beckoned me to sit next to him, and I quickly explained the purpose of my visit before we could get interrupted by one of the many people vying for his attention. When I mentioned that we planned to collect data from Ratanpura CHC, he frowned. “Why are you studying Ratanpura? Study this facility,” the CMO said, gesturing emphatically. I tried to explain that we had already selected Ratanpura, but he continued insisting. “No, no, this is a better facility. You will get better data here.” Perhaps the Ratanpura facility was poorly maintained, I thought, and he worried it would reflect badly on his team.

“We are looking for the facilities that perform the most female sterilization,” I explained. “Haan, then study here!” the CMO responded, tapping the desk in front of him. “Female sterilization is performed only here.” What? How could that be? “What do you mean?” I asked. “This is the only CHC which performs female sterilization,” he repeated. “In the whole district?” “Yes. I know because I am the only one who performs sterilization!” he laughed at my confusion, finally delivering the punchline. “But, that’s not what the UPHMIS data said,” I attempted weakly, realizing even before I finished that the data we had originally consulted was likely very flawed.

UPHMIS data is facility-reported, after all, leaving plenty of opportunity for embellishment. I looked across the room at a large PC where the facility’s designated data operator sat, and wondered if were we about to embark on a data-collecting mission in all the wrong facilities. After ten minutes of back-and-forth questions with the CMO, I accepted that we would have to swap out Ratanpura for Kopaganj in our study. As an early career researcher, this was my first experience with what seasoned researchers know well – that data is messy, flawed, and often misrepresentative of reality. Despite having carefully pored over UPHMIS data for dozens of facilities and health indicators before selecting the study facilities, I suddenly felt as if we had been flying blind the whole time.

Although our study was a relatively small-scale one, breakdown in data quality is not an uncommon occurrence. Primary data forms the bedrock of social services – implementing, assessing and recalibrating public programs requires an understanding of what is happening on the ground. And when the story we see is incomplete or inaccurate, the decisions we make about how to fund, expand or implement social programs can lead to resources being allocated in ways that ultimately prove ineffective, or even harmful. The National Family Health Survey (NFHS) is a larger example of this. As the largest source of primary health data in India, NFHS provides data upon which thousands of health-related research studies, public programs and policy decisions are based. Yet NFHS’s well-documented weaknesses — including a lengthy questionnaire and poorly trained and compensated field researchers — pose a significant vulnerability to India’s public health sector, which has built its programs largely based on NFHS findings for the last twenty-plus years.

So what do we do when the data we’re working with is flawed? How do we implement solutions when we don’t fully know the scope and details of the problem? There is no perfect solution — how large the gap between data and reality is for any given dataset is, by nature, impossible to fully ascertain. Instead, we must commit to thorough investigations of data quality at every point of the research process, and recognize current limitations as a jumping-off point for future study.

Firstly, investigators should be mindful of the challenges of data collection, and be amenable to accommodations in study protocol which allow the most accurate data possible to be captured. Likewise, field researchers should have not just proper training but also an investment in the study, in order to understand when and how to make these accommodations in real-time. And lastly, policy-makers, social program implementors and other stakeholders who use base their decision-making on ground-level evidence should allow a buffer for the dynamic, imperfect nature of primary research, and likewise work to improve conditions for program data collection wherever possible. Data tells a story, translating the world around us into building blocks for growth and innovation. Much can get lost in translation, however, and it up to us as researchers and program architects to constantly question and clarify the message we receive, in order to gather the clearest, most holistic picture of the truth as possible.

Perspectives from ISB

Building With Imperfect Tools: Notes from the Field on Data Quality and Uncertainty

Recent Posts

Tags