A COCO-Formatted Instance-Level Dataset for Plasmodium Falciparum Detection in Giemsa-Stained Blood Smears

Frauke Wilm1,2Orcid, Luis Carlos Rivera Monroy1,2Orcid, Mathias Öttl1,2Orcid, Lukas Mürdter1, Leonid Mill1,2Orcid, Andreas Maier2Orcid
1: MIRA Vision Microscopy GmbH, 73037 Göppingen, Germany, 2: Pattern Recognition Lab, Department of Computer Science, Friedrich-Alexander-Universität (FAU) Erlangen-Nürnberg, Erlangen, Germany
Publication date: 2025/12/31
https://doi.org/10.59275/j.melba.2025-46d9
PDF · Data

Abstract

Accurate detection of Plasmodium falciparum in Giemsa-stained blood smears is an essential component of reliable malaria diagnosis, especially in developing countries. Deep learning-based object detection methods have demonstrated strong potential for automated Malaria diagnosis, but their adoption is limited by the scarcity of datasets with detailed instance-level annotations. In this work, we present an enhanced version of the publicly available NIH malaria dataset, with detailed bounding box annotations in COCO format to support object detection training. We validated the revised annotations by training a Faster R-CNN model to detect infected and non-infected red blood cells, as well as white blood cells. Cross-validation on the original dataset yielded F1,scores of up to 0.88 for infected cell detection. These results underscore the importance of annotation volume and consistency, and demonstrate that automated annotation refinement combined with targeted manual correction can produce training data of sufficient quality for robust detection performance. The updated annotations set is publicly available via Zenodo: https://doi.org/10.5281/zenodo.17514694

Keywords

Malaria · Plasmodium Falciparum · Thin Blood Smear · NIH · COCO

Bibtex @article{melba:2025:040:wilm, title = "A COCO-Formatted Instance-Level Dataset for Plasmodium Falciparum Detection in Giemsa-Stained Blood Smears", author = "Wilm, Frauke and Rivera Monroy, Luis Carlos and Öttl, Mathias and Mürdter, Lukas and Mill, Leonid and Maier, Andreas", journal = "Machine Learning for Biomedical Imaging", volume = "3", issue = "Special Issue on Open Data at MICCAI 2024–2025", year = "2025", pages = "849--855", issn = "2766-905X", doi = "https://doi.org/10.59275/j.melba.2025-46d9", url = "https://melba-journal.org/2025:040" }
RISTY - JOUR AU - Wilm, Frauke AU - Rivera Monroy, Luis Carlos AU - Öttl, Mathias AU - Mürdter, Lukas AU - Mill, Leonid AU - Maier, Andreas PY - 2025 TI - A COCO-Formatted Instance-Level Dataset for Plasmodium Falciparum Detection in Giemsa-Stained Blood Smears T2 - Machine Learning for Biomedical Imaging VL - 3 IS - Special Issue on Open Data at MICCAI 2024–2025 SP - 849 EP - 855 SN - 2766-905X DO - https://doi.org/10.59275/j.melba.2025-46d9 UR - https://melba-journal.org/2025:040 ER -

2025:040 cover