Advances in Information and Communication Technologies for Development (ICT4D) and data science have paved the way for a systematic, reproducible, and scalable approach to data cleaning in routine health information systems. A logic model was developed for data cleaning, incorporating an algorithm for screening, diagnosis, and editing datasets in a rule-based, interactive, and semi-automated manner. This model was applied to the dengue line-list of the National Vector Borne Disease Control Programme in Punjab, India from 2015 to 2019. The study successfully cleaned and imputed missing data for an estimated date in over 96% of records for 2015 and 98.9% for 2016. Additionally, all cases from 2017 to 2019 were cleaned and imputed effectively. Information on age and sex was extracted accurately from more than 98.4% and 99.4% of records respectively. The application of the logic model resulted in the development of an analysis-ready dataset that can be utilized to understand spatiotemporal epidemiology and support data-driven public health decision-making. While the study demonstrated significant progress in enhancing data quality within routine health surveillance systems, there are limitations to consider. Understanding the underlying reasons behind data anomalies was not within the scope of this study, highlighting a need for further research in this area. The algorithm developed focused on a single disease dataset, necessitating additional screening mechanisms when applied to different diseases or program datasets. Despite these limitations, the study's strength lies in its reproducible and scalable logic algorithm for preprocessing routine health surveillance data. This approach enables researchers to analyze existing datasets efficiently, leading to insights into disease epidemiology over time and space. The scalability of these algorithms offers potential applications beyond dengue surveillance in India to other diseases globally. Moving forward, future studies should explore additional variables relevant to health program managers and researchers while ensuring external generalizability of the algorithm across diverse contexts. By incorporating robust ICT4D principles into research practices, sustainable improvements in data quality can be achieved, ultimately contributing to better-informed public health interventions worldwide.
- - Advances in Information and Communication Technologies for Development (ICT4D) and data science have enabled a systematic, reproducible, and scalable approach to data cleaning in routine health information systems.
- - A logic model was developed for data cleaning, incorporating an algorithm for screening, diagnosis, and editing datasets in a rule-based, interactive, and semi-automated manner.
- - The model was successfully applied to the dengue line-list of the National Vector Borne Disease Control Programme in Punjab, India from 2015 to 2019.
- - Significant data cleaning and imputation success rates were achieved: over 96% records cleaned for 2015, 98.9% for 2016, and all cases from 2017 to 2019 effectively cleaned and imputed.
- - Accurate extraction of age (over 98.4%) and sex (over 99.4%) information from records was achieved.
- - The logic model resulted in the development of an analysis-ready dataset that supports spatiotemporal epidemiology understanding and data-driven public health decision-making.
- - Limitations include not exploring reasons behind data anomalies within the study's scope and the need for additional screening mechanisms when applying the algorithm to different diseases or program datasets.
- - Despite limitations, the study's strength lies in its reproducible and scalable logic algorithm for preprocessing routine health surveillance data with potential applications beyond dengue surveillance in India globally.
- - Future studies should explore additional variables relevant to health program managers while ensuring external generalizability of the algorithm across diverse contexts by incorporating robust ICT4D principles into research practices.
Summary- New technologies and data science help clean health data more efficiently.
- A special method was created to clean and fix datasets automatically.
- The method was used successfully in India to clean dengue disease data.
- Most of the data from 2015 to 2019 was cleaned accurately.
- Important information like age and sex were extracted correctly.
Definitions- Technologies: Tools or methods used to do things better or faster.
- Data science: Using math and technology to understand and work with data.
- Datasets: Collections of information or data organized for analysis.
- Dengue: A type of disease spread by mosquitoes causing fever and body pain.
- Epidemiology: Studying how diseases spread in populations.
Advances in Information and Communication Technologies for Development (ICT4D) and data science have revolutionized the way we approach data cleaning in routine health information systems. With the development of a logic model that incorporates an algorithm for screening, diagnosis, and editing datasets, this systematic, reproducible, and scalable approach has shown promising results in improving data quality within health surveillance systems.
The study conducted by researchers at the National Vector Borne Disease Control Programme in Punjab, India focused on developing a logic model for data cleaning specifically for dengue line-lists from 2015 to 2019. The aim was to create an analysis-ready dataset that could be used to understand spatiotemporal epidemiology and support data-driven public health decision-making. The results were impressive with over 96% of records from 2015 and 98.9% from 2016 being successfully cleaned and imputed with missing data. Furthermore, all cases from 2017 to 2019 were effectively cleaned and imputed as well.
One of the key strengths of this study lies in its use of ICT4D principles which have enabled researchers to develop a logic model that is not only reproducible but also scalable. This means that it can be applied not just to dengue surveillance in India but also potentially to other diseases globally. By incorporating robust ICT4D principles into research practices, sustainable improvements in data quality can be achieved.
However, it is important to note some limitations of this study as well. While the logic model developed showed significant progress in enhancing data quality within routine health surveillance systems, it did not address the underlying reasons behind data anomalies. This highlights a need for further research in this area.
Another limitation is that the algorithm developed focused on a single disease dataset – dengue line-lists – which may require additional screening mechanisms when applied to different diseases or program datasets. Therefore, future studies should explore additional variables relevant to health program managers and researchers while ensuring the external generalizability of the algorithm across diverse contexts.
Despite these limitations, this study has paved the way for a more efficient and effective approach to data cleaning in routine health information systems. By utilizing ICT4D principles and data science techniques, researchers have been able to develop a logic model that can be applied on a larger scale, leading to insights into disease epidemiology over time and space.
The implications of this research are significant not just for India but also for other developing countries where routine health surveillance systems may face similar challenges with data quality. By implementing this systematic approach to data cleaning, decision-makers can have access to reliable and accurate data which is crucial in designing effective public health interventions.
In conclusion, the application of ICT4D principles and data science techniques in developing a logic model for data cleaning has shown promising results in improving data quality within routine health surveillance systems. This study serves as an important step towards achieving sustainable improvements in public health interventions worldwide. Further research in this area will continue to enhance our understanding of disease epidemiology and support evidence-based decision-making for better health outcomes globally.