1 Introduction
Although rare, pediatric tumors of the central nervous system are the leading cause of cancer-related death in children (Ostrom et al., 2022; Hossain et al., 2021). While they share some similarities with adult brain tumors, their imaging characteristics and clinical presentations differ significantly. For example, both adult glioblastoma (GBM) and pediatric diffuse midline glioma (DMG), including diffuse intrinsic pontine glioma (DIPG), are high-grade glial tumors with poor survival outcomes (Mackay et al., 2017; Jansen et al., 2015). However, whereas the incidence of adult GBM is 3 in 100,000 people, while DMG is approximately three times rarer. GBM primarily affects the frontal and temporal lobes in adults (median diagnosis age: 64 years) (Thakkar et al., 2014), but DMG predominantly arises in the pons and is diagnosed in children aged 5–10 years. Unlike GBM, which frequently presents with necrotic and enhancing regions on post-gadolinium T1-weighted MRI, DMG often lacks clear necrotic components at initial diagnosis. These differences underscore the need for dedicated imaging tools for pediatric brain tumors to improve diagnosis and prognosis (Familiar et al., 2024).
Pediatric brain tumors exhibit high heterogeneity in histology and imaging features, often comprising subregions such as peritumoral edematous/infiltrated tissue, necrosis, and enhancing tumor margins, reflected in their radio-phenotypes on multi-parametric MRI (mpMRI) images (Fathi Kazerooni et al., 2023a). Tumors in surgically inaccessible locations, such as DIPG, rely primarily on longitudinal MRI for progression assessment. The current standard, defined by the Response Assessment in Pediatric Neuro-Oncology (RAPNO) cooperative working group (Cooney et al., 2020; Erker et al., 2020), involves 2D linear measurements of tumor subregions on the axial slice of the largest extent (Familiar et al., 2024). However, manual segmentation is labor-intensive and prone to inter-operator variability, while 2D measurements provide an inaccurate surrogate for tumor volume due to tumor shape irregularities. Studies in adult and pediatric brain tumors have demonstrated the superiority of 3D volumetric measurements over 2D assessments in predicting clinical outcomes (Elingson et al., 2022; Lazow et al., 2022).
Automated tumor segmentation is critical for volumetric analysis, aiding in surgical planning, treatment response assessment, and longitudinal monitoring. The importance of volumetric tumor measurements—and, consequently, automated segmentation—has been increasingly recognized in pediatric neuro-oncology (Jansen et al., 2015; Lazow et al., 2022). Several automated tumor segmentation methods have been developed for pediatric brain tumors (Fathi Kazerooni et al., 2023a; Vossough et al., 2022; Artzi et al., 2020; Kharaji et al., 2024; Liu et al., 2023; Mansoor et al., 2016; Tor-Diez et al., 2020; Peng et al., 2022; Nalepa et al., 2022; Gandhi et al., 2024; Boyd et al., 2024; Bareja et al., 2024; Hashmi et al., 2024). Among available architectures, 3D U-Net (Tor-Diez et al., 2020; Peng et al., 2022) and U-Net combined with ResNet (Artzi et al., 2020) have been employed for whole tumor segmentation on T2-weighted or Fluid Attenuated Inversion Recovery (FLAIR) images and enhancing tumor segmentation on post-contrast T1-weighted imaging. The nnU-Net architecture (Isensee et al., 2021) has emerged as a leading model for pediatric brain tumor segmentation, demonstrating strong performance across different pediatric tumor types (Vossough et al., 2022; Kharaji et al., 2024; Liu et al., 2023; Gandhi et al., 2024; Bareja et al., 2024).
Transfer learning has also been applied, with models pretrained on adult glioma datasets and later fine-tuned for pediatric applications (Kharaji et al., 2024; Liu et al., 2023; Nalepa et al., 2022; Boyd et al., 2024). However, due to structural and developmental changes in the pediatric brain, transfer learning from adult to pediatric populations may not yield optimal results (Boyd et al., 2024). Beyond nnU-Net, alternative deep learning architectures have been explored, including MedNeXt, a ConvNeXt-based model optimized for brain tumor segmentation (Hashmi et al., 2024).
Most segmentation studies before the Brain Tumor Segmentation in Pediatrics (BraTS-PEDs) challenge relied only on one or two MRI sequences (Artzi et al., 2020; Tor-Diez et al., 2020; Peng et al., 2022; Nalepa et al., 2022; Boyd et al., 2024). Few studies integrated multiparametric MRI, which is essential for accurately segmenting tumor subregions (Fathi Kazerooni et al., 2023a; Vossough et al., 2022; Liu et al., 2023). Furthermore, only a limited number of studies have developed pediatric-specific tumor subregion segmentation frameworks aligned with the RAPNO guidelines, spanning multiple histologies (Fathi Kazerooni et al., 2023a; Vossough et al., 2022; Peng et al., 2022; Gandhi et al., 2024). The absence of benchmarking platforms has further prevented standardized evaluation and comparison of these methods using a common validation dataset.
The BraTS challenges, organized in conjunction with the Medical Image Computing and Computer Assisted Intervention (MICCAI, https://miccai.org) since 2012, have established a benchmarking framework for adult glioma segmentation (Bakas et al., 2018; Menze et al., 2014; Bakas et al., 2017; Baid et al., 2021). In 2023, the BraTS challenge expanded to include a variety of tumor entities such as adult glioma, intracranial meningioma (LaBella et al., 2023, 2024a), brain metastases (Moawad et al., 2023), Sub-Saharan African glioma, and pediatric tumors (Kazerooni et al., 2023b) for the segmentation task. Additionally, new tasks were introduced, including Synthesis (Global) - Missing MRI (Li et al., 2023), Synthesis (Local) – Inpainting (Kofler et al., 2023), and Augmentation. The pediatric segmentation challenge (BraTS-PEDs) was initiated following the 2022 BraTS competition, where 60 pediatric DMGs were included in the test phase. The findings from this evaluation led to the expansion of the challenge in 2023, incorporating multi-consortia pediatric data for training, validation, and testing.
This manuscript presents the data, design, and outcomes of the BraTS-PEDs 2023 challenge, the first benchmarking initiative for pediatric brain tumor segmentation. By evaluating the performance of participating algorithms, this work aims to establish a standardized framework for pediatric brain tumor segmentation, facilitating clinical translation and extending applications to treatment response assessment in pediatric neuro-oncology.
2 Data and Methods
2.1 Cohort Description
The BraTS-PEDs 2023 dataset was supported by multiple consortia, including Children’s Brain Tumor Network (CBTN, https://cbtn.org/) (Lilly et al., 2023), International DIPG/DMG Registry (https://www.dipgregistry.org) and COllaborative Network for NEuro-oncology Clinical Trials (CONNECT, https://connectconsortium.org/), along with additional clinical centers, including Boston’s Children Hospital and Yale University. All cases contained pre- and post-gadolinium T1-weighted (labeled as T1 and T1CE, respectively), T2-weighted (T2), and T2-weighted fluid attenuated inversion recovery (FLAIR) images. These conventional mpMRI sequences are commonly acquired as part of standard clinical care for brain tumors. However, the image acquisition protocols, and MRI equipment differ across different institutions, resulting in heterogeneity in image quality and appearance in the multi-consortium cohort. Inclusion criteria comprised of pediatric subjects with: (1) histologically approved high-grade glioma, i.e., astrocytoma and DMG, including radiologically or histologically proven DIPG; (2) availability of all four structural mpMRI sequences on treatment-naive imaging sessions. Exclusion criteria included: (1) images that were of low quality or contained artifacts precluding reliable tumor segmentation, as determined by qualitative control (QC) by researchers experienced in pediatric brain tumor imaging and segmentation; and (2) infants younger than one month of age. The latter criteria was imposed as the neonatal brain within the first month undergoes rapid developmental changes, including immature myelination and evolving tissue contrasts (Gilmore et al., 2007), which can complicate image segmentation and analysis. In the 2023 BraTS-PEDs, data for 167 patients were included (CBTN, n = 113; Boston’s Children Hospital, n = 30; Yale University, n = 24). Data from the International DIPG/DMB Registry was only processed but not segmented for the 2023 challenge, but will be included in the 2024 challenge.
2.2 Imaging Data
For all patients, mpMRI scans were pre-processed using a standardized approach, including conversion of the DICOM files to the NIfTI file format, co-registration to the same anatomical template (i.e., SRI24) (Rohlfing et al., 2010), and resampling to an isotropic resolution of 1 mm3. The pre-processing pipeline is publicly available through the Cancer Imaging Phenomics Toolkit (CaPTk) (Pati et al., 2020; Rathore et al., 2018; Davatzikos et al., 2018), Federated Tumor Segmentation (FeTS) tool https://github.com/FETS-AI/Front-End/ (Pati et al., 2022), and the docker container for pediatric-specific pipeline including skull-stripping and tumor segmentation on GitHub: https://github.com/d3b-center/peds-brain-seg-pipeline-public. De-identification was performed through removing protected health information from DICOM headers. Defacing was performed via skull-stripping using a pediatric-specific skull-stripping method (Fathi Kazerooni et al., 2023a) to prevent any potential facial reconstruction/recognition of the patients. Figure 1A presents the data preparation workflow.
The pre-processed images were segmented into tumor subregions using either of two pediatric automated deep learning segmentation models (Fathi Kazerooni et al., 2023a; Liu et al., 2023) before being manually corrected by pediatric neuro-radiologists. The tumors were segmented into four subregions. These subregions comprising of enhancing tumor (ET), non-enhancing tumor (NET), cystic component (CC), and peritumoral edema (ED) regions, with definitions provided in (Familiar et al., 2024), and summarized below:
- •
ET is described by areas with enhancement (brightness) on T1 post-contrast images as compared to T1 pre-contrast. In case of mild enhancement, checking the signal intensity of normal brain structures can be helpful.
- •
CC typically appears with hyperintense signal (very bright) on T2 and hypointense signal (dark) on T1CE. The cystic portion should be within the tumor, either centrally or peripherally (as compared to ED which is peritumoral). The brightness of CC is defined as comparable or close to cerebrospinal fluid (CSF).
- •
NET is defined as any other abnormal signal intensity within the tumorous region that cannot be defined as enhancing or cystic. For example, the abnormal signal intensity on T1, FLAIR, and T2 that is not enhancing on T1CE should be considered as non-enhancing portion.
- •
ED is defined by the abnormal hyperintense signal (very bright) on FLAIR scans. ED is finger-like spreading that preserves underlying brain structure and surrounds the tumor.
The American Society of Neuroradiology collaborated in generating ground truth annotation for the majority of data in this challenge (ASNR, https://www.asnr.org/). Four labels were generated (Figure 1) using the preliminary automated segmentation and were then manually revised by ASNR volunteer neuroradiology experts of varying rank and experience in accordance with the annotation guidelines. The expert annotators were provided with the four mpMRI sequences (T1, T1CE, T2, FLAIR) along with the fused automated segmentation volume to initiate the manual refinements using the ITK-SNAP software (Yushkevich et al., 2006). After segmentation corrections by the annotators, three attending board-certified neuroradiologists reviewed the segmentations to either approve or return to the individual annotators for further refinements. This process was followed iteratively until the approvers found the refined tumor subregion segmentations acceptable for public release and the challenge conduction. The final segmentations were provided and used as ground truth for model training and evaluation.

Although four tumor subregions were annotated for all the pediatric data, to follow the guidelines of the other BraTS 2023 challenges and to allow the teams to potentially incorporate data from adult glioma challenge into their model training, the teams were provided with only three segmentation labels, i.e., non-enhancing component (NC – the union of non-enhancing tumor, cystic component, and necrosis; label 1), ED (label 2), and ET (label 3), as shown in Figure 1B. The color maps in the images shown in Figure 1B illustrate the label descriptions provided in the original annotations given to the annotators (Figure 1B; left), as compared to the final annotations (Figure 1B; right) – which adhere to the standards of other BraTS challenges – provided to the teams.
2.3 Challenge Setup
The BraTS-PEDs 2023 cohort was split into training (n = 98), validation (n = 44), and testing datasets (n = 24). Training and validation included a random split between CBTN and Boston cases. All Yale cases were used for independent testing. The training data shared with the teams comprised mpMRI scans and ground truth labels for the training cohort, released in mid-May 2023. The cohort shared for validation included only the mpMRI sequences without associated ground truth and it was released in late June 2023. Notably, the testing cohort was withheld from the participants.
Teams were prohibited from training their algorithm on any additional public and/or private data besides the provided BraTS-PEDs 2023 data or to using models pretrained on other datasets. This restriction was imposed to allow for a direct and fair comparison among the participating methods. However, teams were allowed to use additional public and/or private data for publication of their scientific papers, on the condition that they also provide a report of their results on the data from BraTS-PEDs 2023 challenge alone and discuss potential differences in the obtained results.
The methods submitted to the BraTS-PEDs 2023 challenge were evaluated based on performances for the segmentation of ET, tumor core (TC – the combination of ET and NC), and the entire tumorous region, or whole tumor (WT – the union of ET, NC, and ED) (Figure 1C). The containerized submissions of the teams were evaluated on the synapse.org platform supported by Medperf (Karargyris et al., 2023). Teams were required to send the output of their methods to the evaluation platform for the scoring to occur during the training and the validation phases. At the end of the validation phase, teams were asked to identify the method they would like to be evaluated by the organizers in the final testing/ranking phase. The training, validation, and testing phases ran between May-July 2023, July-August 2023, and August 2023, respectively. The results and ranking of the challenge were presented at the MICCAI 2023 conference on October 10, 2023.
2.4 Benchmark Design: Evaluation and Ranking
The evaluation and ranking were based on a lesionwise ranking scheme in which the ranking for each team was computed relative to other challenge competitors for each of the subjects in the test set, for each tumor region, e.g., ET, TC, and WT. We used two established performance evaluation metrics, i.e., Dice similarity coefficient (Dice) and 95% Hausdorff distance (HD95) using the Generally Nuanced Deep Learning Framework (GaNDLF; gandlf.org) (Pati et al., 2023). At BraTS-PEDs 2023, each team was ranked for 24 test subjects, 3 tumor regions, and 2 metrics, resulting in 144 individual rankings. Then for each team, the final ranking score (FRS) was computed by producing a cumulative rank as an average of all individual rankings for each patient, followed by averaging the cumulative ranks across all testing subjects for each competing team.
The lesion-wise Dice and HD95 were calculated by isolating and identifying each disjoint lesion, through dilating the ground truth segmentation masks for each tumor region, i.e., ET, TC, and WT, by 3 pixels in all directions (i.e., by a size of 3x3x3). Dice and HD95 scores were calculated for each component or lesion individually, after penalizing all false positive and false negative values with a score of 0. Subsequently, the mean value of Dice and HD95 scores across all lesions were calculated for each subject. The lesions below a cutoff value of 50 voxels were removed from any further evaluation. If the models were not able to predict a region for a subject, the Dice score was assigned a value of 0 and HD95 was set to 374 (the distance between corners in the SRI atlas space), and if they correctly did not predict a non-existing region, the Dice score and HD95 were given values of 1 and 0, respectively. Lesion-wise Dice and HD95 scores for each case were calculated according to equations 1 and 2. More information on the evaluation metrics along with the codes can be found at https://github.com/rachitsaluja/BraTS-2023-Metrics.
(1) |
(2) |
Where L represents the number of ground truth lesions, TP denotes the number of true positives, FN indicates the number of false negatives, and FP signifies the number of false positives.
Finally, the statistical differences in performance, beyond those expected by chance, between each pair of teams were determined using permutation testing. This was achieved for each pair of teams, by randomly permuting the cumulative ranks for each subject and calculating the difference in FRS between the pair of teams in each permutation. Statistical significance of rankings was determined using a p-value when the proportion of times the difference in FRS using the permutated data exceeded the observed difference in FRS using the actual data. These values are reported in an upper triangular matrix.
2.5 Data Availability
The training data, including ground truth segmentations, and the validation data without ground truth segmentations can be accessed and downloaded from the Synapse www.synapse.org/Synapse:syn51156910/wiki/627000. More information about the BraTS-PEDs 2023 challenge is at www.synapse.org/Synapse:syn51156910/wiki/622461.
3 Results
3.1 Overview of the Submissions
Among the teams that participated in the training phase, 18 teams submitted a total of 159 entries during the validation phase, with 147 of these submissions receiving scores. Nine teams completed all challenge stages and were evaluated during the testing phase of the BraTS‐PEDs 2023 challenge (Capellán-Martín et al., 2023; Myronenko et al., 2023; Zhou et al., 2023; Huang et al., 2023; Maani et al., 2023; Mistry et al., 2023; Javaji et al., 2023; Mantha et al., 2023; Temtam et al., 2023). During the competition, participants were allowed unlimited validation of their algorithms on the Synapse platform using validation data with withheld ground truth, which enabled them to iteratively fine-tune their models before making their final submission for testing. The validation outcomes for each team is reported in their respective publications (Capellán-Martín et al., 2023; Myronenko et al., 2023; Zhou et al., 2023; Huang et al., 2023; Maani et al., 2023; Mistry et al., 2023; Javaji et al., 2023; Mantha et al., 2023; Temtam et al., 2023).
3.2 Summary of Participating Methods in BraTS-PEDs 2023 Challenge
The participating teams in the BraTS-PEDs 2023 challenge utilized diverse deep learning-based segmentation approaches, incorporating various architectures, training strategies, and post-processing techniques to tackle the complexities of pediatric brain tumor segmentation. Below, we provide a summarized overview of these methods (see Table 1 for a summary of the proposed techniques by each team), with further details available in the individual publications (Capellán-Martín et al., 2023; Myronenko et al., 2023; Zhou et al., 2023; Huang et al., 2023; Maani et al., 2023; Mistry et al., 2023; Javaji et al., 2023; Mantha et al., 2023; Temtam et al., 2023).
Several teams used variants of the U-Net architecture, with modifications to improve feature extraction and segmentation accuracy. The BiomedMBZ team implemented SegResNet and MedNext with deep supervision and extensive pre- and post-processing (Maani et al., 2023). Blackbean introduced the Scalable and Transferable U-Net (STU-Net), a large-scale model with up to 1.4 billion parameters, pre-trained on the TotalSegmentator dataset (Huang et al., 2023).
Several teams adopted ensemble approaches to improve robustness. CNMCPMMI combined nnU-Net with Swin UNETR, applying label-wise ensembling and cross-validated thresholding to refine segmentation outputs (Capellán-Martín et al., 2023). SchlaugLab developed a model combining ONet and modified U-Net variants, integrating custom loss functions and composite data augmentation (Javaji et al., 2023). SherlockZyb extended nnU-Net with self-supervised pre-training and an adaptive region-specific loss function to handle tumor heterogeneity and class imbalances (Zhou et al., 2023).
Automated model selection was explored by NVAUTO, which utilized Auto3DSeg from MONAI, requiring minimal user input while training SegResNet with 5-fold cross-validation and deep supervision (Myronenko et al., 2023). Meanwhile, UMNIverse introduced a GAN-based model, Temporal Cubic PatchGAN (TCuP-GAN), incorporating Convolutional Long Short-Term Memory Networks (ConvLSTMs) to leverage temporal MRI features (Mantha et al., 2023). Other teams focused on novel architectural modifications. Isahajmistry team enhanced nnU-Net with Omni-Dimensional Dynamic Convolution (ODConv3D) and multi-scale attention mechanisms for improved feature learning (Mistry et al., 2023). VisionLab proposed a Multiresolution Fractal Deep Neural Network (MFDNN), using wavelet-based convolution layers to extract tumor textures at multiple scales while performing uncertainty analysis to enhance segmentation reliability (Temtam et al., 2023).
3.3 Evaluation and Ranking Results
Table 1 provides a summary of the cumulative ranks and FRS for each team across all test subjects, including the teams’ rankings, and a high-level description of the methods used by each team. The CNMCPMI2023 team (Capellán-Martín et al., 2023) achieved the highest score among the teams, followed by NVAUTO (Myronenko et al., 2023), SherlockZyb (Zhou et al., 2023), and Blackbean (Huang et al., 2023) (see Table 1 that illustrates the scores and methods used by all teams that completed the challenge tasks).
Figure 2 illustrates the p-values calculated between pairs of teams to evaluate statistical significance between the achieved performance. Specifically, among the best performing teams, the results for CNMCPMI2023 and NVAUTO (p = 0.45), NVAUTO and SherlockZyb (p = 0.1), and SherlockZyb and Blackbean (p = 0.43) were not significantly different. This may be due to the small sample size of the test cohort (n = 24) for this challenge.
Figures 3 to 5 display the Dice and HD95 scores for all test subjects for all the participating teams. Tables 2 to 4 summarize the performance metrics—Dice, HD95, and sensitivity—of each team. The CNMCPMI2023 NVAUTO SherlockZyb Blackbean teams achieved Dice scores of 0.83 0.84 0.83 0.81 for WT, 0.81 0.78 0.77 0.79 for TC, and 0.65 0.55 0.63 0.53 for ET segmentation, and HD95 values of 20.86 18.05 6.11 23.56 for WT, 21.82 27.10 22.28 22.02 for TC, and 43.89 115.32 59.68 22.02 for ET segmentation. Interestingly, the sensitivity of the NVAUTO algorithm for the segmentation of WT and ET was higher than that of the other two teams, indicating a higher overall rate of true positive predictions by this algorithm. However, this increased sensitivity came at the cost of over-segmentation, resulting in more false positives and less accurate boundary placement for ET segmentation.
The prediction results for ET demonstrate a greater variability and skewness in the Dice score distribution, as indicated by the mean and median values in Table 2 and the box plots in Figure 3. In contrast, the segmentation of TC and WT (Figures 4-5) shows less variability and skewness. Given the small volume or absence of the ET region in DMG tumors, the Dice score more heavily penalizes segmentation errors in the ET region. TC and WT segmentations are more consistent and accurate across all teams, with WT segmentation being the most precise. This indicates that, on average, most methods reliably locate the lesion within MRI scans, while distinguishing between different tumor compartments is more challenging.
Figure 6 showcases examples of the performances of CNMCPMI2023, NVAUTO, SherlockZyb, and Blackbean teams for tumor segmentation in four testing subjects. Panels (A-B) indicate the best results for these three teams, with WT Dice above 0.90. Panels (C-D) present the subjects with worst results across the top performing teams. As indicated in these examples, there are situations where one algorithm performs better than the remaining algorithms in segmentation of tumorous regions. In Figure 6(C), CNMCPMI2023, SherlockZyb, and Blackbean achieved Dice scores of 0.88, while NVAUTO achieved a low Dice score (0.18) for WT segmentation. This low performance for NVAUTO was achieved due to many false positive results for segmentation of the tumor for this subject. On the other hand, in the example presented in Figure 6(D), NVAUTO and Blackbean algorithms produced WT segmentation results with Dice scores of 0.60 and 0.70, respectively, while the other two algorithms, i.e., CNMCPMI2023 and SherlockZyb, did not generate any segmentation outputs (WT Dice = 0 or 0.05). This situation underscores the potential benefit of ensembling different algorithms in achieving better overall performance.
Teams | Method | Cumulative Ranks Across Subjects | FRS | Rank in the Challenge |
CNMCPMI2023 | Ensemble of nnU-Net and Swin UNETR | 241.5 | 10.06 | - 111Participating teams that included challenge organizers underwent evaluation without ranking to uphold fairness in our benchmarking initiative. This approach aligns with MICCAI guidelines, as well as the CBTN policy for hackathons. |
NVAUTO | Auto3DSeg | 246 | 10.25 | 1 |
SherlockZyb | nnU-Net (self-supervised) | 286.5 | 11.93 | 2 |
Blackbean | Scalable and Transferable U-Net (STU-Net) | 290.5 | 12.1 | 3 |
BiomedMBZ | Ensemble of SegResnet and MedNext | 318 | 13.25 | 4 |
SchlaugLab | Ensemble of ONet and UNet | 395.5 | 16.47 | 5 |
VisionLabODU23 | Multiresolution Fractal Deep Neural Network (MFDNN) | 427 | 17.79 | 6 |
Isahajmistry | Modified nnU-Net | 483 | 20.12 | 7 |
UMNiverse | Temporal Cubic PatchGAN (TCuP-GAN) | 552 | 23 | 8 |




Teams | Dice Similarity Coefficient | 95% Hausdorff Distance | Sensitivity |
---|---|---|---|
CNMCPMI2023 | 0.65 ± 0.32 (0.74) | 43.89 ± 108.59 (3.67) | 0.70 ± 0.18 (0.74) |
NVAUTO | 0.55 ± 0.37 (0.68) | 115.32 ± 144.03 (4.04) | 0.77 ± 0.20 (0.80) |
SherlockZyb | 0.63 ± 0.32 (0.71) | 59.68 ± 116.70 (3.93) | 0.71 ± 0.27 (0.74) |
Blackbean | 0.53 ± 0.34 (0.60) | 22.02 ± 52.12 (5.83) | 0.63 ± 0.31 (0.71) |
BiomedMBZ | 0.55 ± 0.27 (0.61) | 45.32 ± 109.10 (5.69) | 0.80 ± 0.29 (0.92) |
SchlaugLab | 0.55 ± 0.36 (0.65) | 65.23 ± 122.23 (4.30) | 0.55 ± 0.37 (0.59) |
VisionLabODU23 | 0.17 ± 0.38 (0) | 311.67 ± 142.38 (374) | 0.17 ± 0.38 (0) |
Isahajmistry | 0.48 ± 0.36 (0.49) | 128.12 ± 147.11 (10.85) | 0.66 ± 0.35 (0.82) |
UMNiverse | 0.17 ± 0.38 (0) | 311.67 ± 142.38 (374) | 0.17 ± 0.38 (0) |
Teams | Dice Similarity Coefficient | 95% Hausdorff Distance | Sensitivity |
---|---|---|---|
CNMCPMI2023 | 0.81 ± 0.18 (0.85) | 21.82 ± 75.15 (5.26) | 0.76 ± 0.18 (0.81) |
NVAUTO | 0.78 ± 0.19 (0.85) | 27.10 ± 72.11 (5.08) | 0.74 ± 0.14 (0.77) |
SherlockZyb | 0.77 ± 0.21 (0.83) | 22.28 ± 75.08 (5.70) | 0.68 ± 0.21 (0.72) |
Blackbean | 0.79 ± 0.14 (0.83) | 22.02 ± 52.15 (5.83) | 0.75 ± 0.11 (0.79) |
BiomedMBZ | 0.77 ± 0.20 (0.82) | 30.36 ± 82.47 (7.21) | 0.73 ± 0.19 (0.92) |
SchlaugLab | 0.70 ± 0.20 (0.75) | 31.61 ± 82.20 (7.00) | 0.61 ± 0.20 (0.65) |
VisionLabODU23 | 0.71 ± 0.25 (0.81) | 38.40 ± 71.01 (6.5) | 0.70 ± 0.23 (0.77) |
Isahajmistry | 0.32 ± 0.29 (0.30) | 130.25 ± 135.52 (41.59) | 0.28 ± 0.25 (0.21) |
UMNiverse | 0.25 ± 0.22 (0.19) | 55.15 ± 107.21 (16.35) | 0.19 ± 0.16 (0.14) |
Teams | Dice Similarity Coefficient | 95% Hausdorff Distance | Sensitivity |
---|---|---|---|
CNMCPMI2023 | 0.83 ± 0.18 (0.87) | 20.86 ± 75.31 (4.24) | 0.76 ± 0.18 (0.81) |
NVAUTO | 0.84 ± 0.16 (0.87) | 18.05 ± 62.77 (4.30) | 0.80 ± 0.09 (0.82) |
SherlockZyb | 0.83 ± 0.17 (0.87) | 6.11 ± 4.50 (4.79) | 0.75 ± 0.17 (0.80) |
Blackbean | 0.81 ± 0.15 (0.86) | 23.56 ± 61.60 (4.95) | 0.78 ± 0.07 (0.80) |
BiomedMBZ | 0.78 ± 0.20 (0.82) | 30.45 ± 82.46 (6.85) | 0.72 ± 0.19 (0.77) |
SchlaugLab | 0.79 ± 0.18 (0.84) | 22.36 ± 75.20 (5.92) | 0.71 ± 0.18 (0.75) |
VisionLabODU23 | 0.68 ± 0.26 (0.81) | 68.18 ± 69.56 (6.35) | 0.73 ± 0.21 (0.79) |
Isahajmistry | 0.35 ± 0.23 (0.29) | 221.95 ± 99.53 (250.67) | 0.76 ± 0.20 (0.81) |
UMNiverse | 0.60 ± 0.25 (0.70) | 72.62 ± 107.48 (10.69) | 0.61 ± 0.16 (0.59) |

4 Discussion
The results of the BraTS-PEDs 2023 challenge provide valuable insights into different automated volumetric segmentation methods for pediatric brain tumors. The comprehensive evaluation metrics and rigorous validation phase in this challenge highlight the strengths and weaknesses of the various methods, offering a clear benchmark for future developments.
4.1 Overview of the Challenge Results
In 2022, the BraTS Glioma challenge tested top-performing methods—developed for adult glioma segmentation—on a small set of unseen pediatric brain tumor data. Results revealed that although models trained on adult brain tumors achieved high Dice scores for segmenting the whole pediatric tumor from mpMRI scans, their performance on tumor subregions (such as ET or NC) was inaccurate. This confirmed our hypothesis that effective pediatric segmentation techniques require training on a curated and annotated pediatric brain tumor dataset. However, as with any machine learning and deep learning endeavor, developing a model that generalizes well necessitates a large, standardized dataset. To address this, the BraTS-PEDs 2023 initiative has compiled the most extensive annotated collection of pediatric high-grade gliomas, including astrocytomas and DMGs.
The top-performing methods at BraTS-PEDs 2023 included an ensemble method combining nnU-Net and Swin UNETR, Auto3DSeg using the SegResNet algorithm, and an extension of the nnU-Net model with self-supervised pretraining integrated with adaptive region-specific loss. All of these methods are based on the U-Net convolutional neural network architecture. The models achieved mean Dice scores of 0.83-0.84 for WT segmentation and slightly lower yet respectable Dice scores of 0.77-0.81 for TC segmentation. However, lower and more variable performance was observed in the segmentation of the ET region, with mean Dice scores of 0.65, 0.63 and 0.55 across the three top performing teams, which emphasizes the unique challenges posed by pediatric brain tumors. The small volume or absence of ET region in DMG likely contributes to the greater variability and skewness in the Dice score distribution for this subregion. This finding indicates that while most methods can reliably locate the WT within MRI scans, accurately distinguishing between different tumor compartments remains a significant challenge in pediatric brain tumors.
The examples in the manuscript demonstrate that no single algorithm consistently outperforms others across all cases. The variability in performance, especially in challenging scenarios, highlights the need for continued research and development in this field. Specifically, ensembling the top-performing models may achieve better overall performance for segmentation of multi-institutional data by leveraging their individual strengths and mitigating their weaknesses. For instance, a method with high sensitivity, despite relatively lower Dice and higher HD95 scores, can be useful when it is critical not to miss any positive regions (such as in ET segmentation). By combining this method with others, it can be refined to reduce false positives and improve boundary accuracy.
4.2 Clinical Implications
The advancements in segmentation algorithms, as showcased by the BraTS-PEDs 2023 challenge, have several clinical implications. Detailed automated segmentations can enable radiation oncologists to better target treatment areas while sparing healthy tissue, ultimately reducing side effects (Bibault and Giraud, 2024). Furthermore, the integration of such automated tools into clinical workflows may expedite diagnostic processes and reduce inter-observer variability, facilitating more standardized and timely treatment decisions. Importantly, consistent monitoring facilitated by these algorithms could lead to prompt interventions, which is particularly crucial in pediatric populations where timely treatment is essential to reduce long-term adverse developmental impacts. Similarly, accurate tumor delineation can improve surgical planning by providing neurosurgeons with precise maps of tumor boundaries, thereby improving resection strategies and minimizing damage to critical brain structures (Sulangi et al., 2024). Future work should focus on prospective clinical validations to assess the real-world benefits and cost-effectiveness, ensuring they contribute meaningfully to patient outcomes and personalized treatment strategies.
4.3 Limitations
Unlike previous challenges (e.g., BRATS (Baid et al., 2021; Moawad et al., 2023; LaBella et al., 2024b)) that focused on adult populations, the BraTS‐PEDs 2023 challenge specifically addressed brain tumor segmentation in pediatric patients, who present unique challenges such as age-related anatomical variations. Nonetheless, our analysis reaffirms common limitations of current deep learning paradigms, including limited generalizability across diverse acquisition protocols and dependence on large, well-annotated datasets, that continue to impede robust clinical translation.
BraTS-PEDs 2023 also faced specific limitations, including a small dataset size and a narrow range of pediatric brain tumor histologies. For instance, the statistical analysis comparing the top teams revealed no significant performance differences, which may be attributed to the small sample size of the test cohort (n = 24). Future challenges would benefit from larger test cohorts to better discern performance differences and validate algorithm robustness.
In addition, to align with the rest of the BraTS 2023 cluster of challenges, our evaluation focused on segmenting the enhancing tumor, tumor core, and whole tumor. This focus may limit applicability of these methods for response assessment in pediatric brain tumors. Recognizing the distinct differences between adult and pediatric brain tumors (Mackay et al., 2017), especially in components critical for response assessment (Mackay et al., 2017; Thakkar et al., 2014; Familiar et al., 2024), we will customize future evaluations to better address pediatric tumor characteristics. Furthermore, the typically small size of ET regions in DMG tumors prompted us to set a threshold of 50 voxels for the detection of a true positive, which might have impacted the accuracy of ET predictions. Moving forward, we intend to adopt a more data-driven approach to setting this threshold in subsequent challenges.
4.4 Future Directions
Encouraged by the results of the BraTS-PEDs 2023 challenge, our next steps include expanding the dataset to include subjects from additional institutions and incorporating various histologies of high- and low-grade pediatric glioma. This initiative will offer the research community access to a comprehensive dataset of rare pediatric tumors with curated mpMRI and annotation, thereby aiding in the development of advanced tools for computer-assisted treatment planning and prognosis. Future data collection will also encompass post-operative and post-treatment scans and clinical outcomes. The data is publicly available at www.synapse.org/Synapse:syn51156910/wiki/627000.
5 Conclusion
The BraTS-PEDs 2023 challenge provided an open-access, curated, annotated dataset of multi-sequence MRIs of pediatric brain tumors. The challenge also provided the platform to evaluate methods for automated volumetric segmentation of pediatric brain tumors, thereby supporting the development of techniques that enhance decision support systems for assessing treatment responses and predicting the outcomes of these rare conditions. This manuscript presented the overview and results of the BraTS-PEDs 2023 challenge.
Acknowledgments
Success in any challenging medical domain depends upon the quality of well annotated multi-institutional datasets. We are grateful to all the data contributors, annotators, and approvers for their time and efforts. Our profound thanks go to the Children’s Brain Tumor Network (CBTN), the Collaborative Network for Neuro-oncology Clinical Trials (CONNECT), the International DIPG/DMG Registry (IDIPGR), the American Society of Neuroradiology (ASNR), and the Medical Image Computing and Computer Assisted Intervention (MICCAI) Society for their invaluable support of this challenge.
Research reported in this publication was partly supported by the National Institutes of Health (NIH) award: NCI/ITCR:U01CA242871 and NCI:UH3CA236536, and by grant funding from the Pediatric Brain Tumor Foundation and DIPG/DMG Research Funding Alliance (DDRFA). The content of this publication is solely the responsibility of the authors and does not represent the official views of the NIH.
Ethical Standards
For this retrospective study, informed consent had been obtained from all subjects at their respective institutions or a waiver of informed consent was approved by the local institutional review board. The protocol for releasing the data was approved by the institutional review board of the data-contributing institution. All patients were fully de-identified and stripped of any patient identifiers before sharing with the organizing team. Further de-identification was performed through renaming all images and the respective ground truth segmentations, and defacing all MRI images through skull-stripping before sharing with the participants.
Conflicts of Interest
None of the authors have any conflicts of interest related to this manuscript.
References
- Artzi et al. (2020) M. Artzi et al. Automatic segmentation, classification, and follow‐up of optic pathway gliomas using deep learning and fuzzy c‐means clustering based on mri. Medical Physics, 47(11):5693–5701, 2020.
- Baid et al. (2021) U. Baid et al. The rsna-asnr-miccai brats 2021 benchmark on brain tumor segmentation and radiogenomic classification. arXiv:2107.02314, 2021.
- Bakas et al. (2017) S. Bakas et al. Advancing the cancer genome atlas glioma mri collections with expert segmentation labels and radiomic features. Scientific data, 4(1):1–13, 2017.
- Bakas et al. (2018) S. Bakas et al. Identifying the best machine learning algorithms for brain tumor segmentation, progression assessment, and overall survival prediction in the brats challenge. arXiv:1811.02629, 2018.
- Bareja et al. (2024) R. Bareja et al. nnu-net–based segmentation of tumor subcompartments in pediatric medulloblastoma using multiparametric mri: A multi-institutional study. Radiology: Artificial Intelligence, 6(5):e230115, 2024.
- Bibault and Giraud (2024) J.-E. Bibault and P. Giraud. Deep learning for automated segmentation in radiotherapy: a narrative review. British Journal of Radiology, 97(1153):13–20, 2024.
- Boyd et al. (2024) A. Boyd et al. Stepwise transfer learning for expert-level pediatric brain tumor mri segmentation in a limited data scenario. Radiology: Artificial Intelligence, 6(4):e230254, 2024.
- Capellán-Martín et al. (2023) D. Capellán-Martín et al. Model ensemble for brain tumor segmentation in magnetic resonance imaging. In Proceedings of International Challenge on Cross-Modality Domain Adaptation for Medical Image Segmentation, pages 221–232, 2023.
- Cooney et al. (2020) T.M. Cooney et al. Response assessment in diffuse intrinsic pontine glioma: recommendations from the response assessment in pediatric neuro-oncology (rapno) working group. The Lancet Oncology, 21(6):e330–e336, 2020.
- Davatzikos et al. (2018) C. Davatzikos et al. Cancer imaging phenomics toolkit: quantitative imaging analytics for precision diagnostics and predictive modeling of clinical outcome. Journal of medical imaging, 5(1):011018–011018, 2018.
- Elingson et al. (2022) B.M. Elingson et al. Volumetric measurements are preferred in the evaluation of mutant idh inhibition in non-enhancing diffuse gliomas: evidence from a phase i trial of ivosidenib. Neuro-Oncology, 24(5):770–778, 2022.
- Erker et al. (2020) C. Erker et al. Response assessment in paediatric high-grade glioma: recommendations from the response assessment in pediatric neuro-oncology (rapno) working group. The Lancet Oncology, 21(6):e317–e329, 2020.
- Familiar et al. (2024) A.M. Familiar et al. Towards consistency in pediatric brain tumor measurements: Challenges, solutions, and the role of ai-based segmentation. Neuro-Oncology, page noae093, 2024.
- Fathi Kazerooni et al. (2023a) A. Fathi Kazerooni et al. Automated tumor segmentation and brain tissue extraction from multiparametric mri of pediatric brain tumors: A multi-institutional study. Neuro-Oncology Advances, 5(1):vdad027, 2023a.
- Gandhi et al. (2024) D.B. Gandhi et al. Automated pediatric brain tumor imaging assessment tool from cbtn: Enhancing suprasellar region inclusion and managing limited data with deep learning. Neuro-Oncology Advances, 6(1):vdae190, 2024.
- Gilmore et al. (2007) J.H. Gilmore et al. Early postnatal development of corpus callosum and corticospinal white matter assessed with quantitative tractography. American Journal of Neuroradiology, 28(9):1789–1795, 2007.
- Hashmi et al. (2024) S. Hashmi et al. Optimizing brain tumor segmentation with mednext: Brats 2024 ssa and pediatrics. arXiv:2411.15872, 2024.
- Hossain et al. (2021) M.J. Hossain et al. Epidemiology and prognostic factors of pediatric brain tumor survival in the us: Evidence from four decades of population data. Vancer epidemiology, 72:101942, 2021.
- Huang et al. (2023) Z. Huang et al. Evaluating stu-net for brain tumor segmentation. In Proceedings of International Challenge on Cross-Modality Domain Adaptation for Medical Image Segmentation, pages 140–151, 2023.
- Isensee et al. (2021) F. Isensee et al. nnu-net: a self-configuring method for deep learning-based biomedical image segmentation. Nature methods, 18(2):203–211, 2021.
- Jansen et al. (2015) M.H. Jansen et al. Survival prediction model of children with diffuse intrinsic pontine glioma based on clinical and radiological criteria. Neuro-Oncology, 17(1):160–166, 2015.
- Javaji et al. (2023) S.R. Javaji et al. Automated ensemble-based segmentation of pediatric brain tumors: A novel approach using the cbtn-connect-asnr-miccai brats-peds 2023 challenge data. arXiv:2308.07212, 2023.
- Karargyris et al. (2023) A. Karargyris et al. Federated benchmarking of medical artificial intelligence with medperf. Nat Mach Intell, 5:799–810, 2023.
- Kazerooni et al. (2023b) A.F. Kazerooni et al. The brain tumor segmentation (brats) challenge 2023: Focus on pediatrics (cbtn-connect-dipgr-asnr-miccai brats-peds). arXiv, 2023b.
- Kharaji et al. (2024) M. Kharaji et al. Brain tumor segmentation with advanced nnu-net: Pediatrics and adults tumors. Neuroscience Informatics, page 100156, 2024.
- Kofler et al. (2023) F. Kofler et al. The brain tumor segmentation (brats) challenge 2023: Local synthesis of healthy brain tissue via inpainting. arXiv:2305.08992, 2023.
- LaBella et al. (2023) D. LaBella et al. The asnr-miccai brain tumor segmentation (brats) challenge 2023: Intracranial meningioma. arXiv:2305.07642, 2023.
- LaBella et al. (2024a) D. LaBella et al. A multi-institutional meningioma mri dataset for automated multi-sequence image segmentation. Scientific Data, 11(1):496, 2024a.
- LaBella et al. (2024b) D. LaBella et al. Analysis of the brats 2023 intracranial meningioma segmentation challenge. arXiv:2405.09787, 2024b.
- Lazow et al. (2022) M.A. Lazow et al. Volumetric endpoints in diffuse intrinsic pontine glioma: comparison to cross-sectional measures and outcome correlations in the international dipg/dmg registry. Neuro-Oncology, 24(9):1598–1608, 2022.
- Li et al. (2023) H.B. Li et al. The brain tumor segmentation (brats) challenge 2023: Brain mr image synthesis for tumor segmentation (brasyn). arXiv, 2023.
- Lilly et al. (2023) J.V. Lilly et al. The children’s brain tumor network (cbtn)-accelerating research in pediatric central nervous system tumors through collaboration and open science. Neoplasia, 35:100846, 2023.
- Liu et al. (2023) X. Liu et al. From adult to pediatric: deep learning-based automatic segmentation of rare pediatric brain tumors. In Proceedings of SPIE Medical Imaging: Computer-Aided Diagnosis, volume 12465, page 1246505, 2023.
- Maani et al. (2023) F. Maani et al. Advanced tumor segmentation in medical imaging: An ensemble approach for brats 2023 adult glioma and pediatric tumor tasks. In Proceedings of International Challenge on Cross-Modality Domain Adaptation for Medical Image Segmentation, pages 264–277, 2023.
- Mackay et al. (2017) A. Mackay et al. Integrated molecular meta-analysis of 1,000 pediatric high-grade and diffuse intrinsic pontine glioma. Cancer cell, 32(4):520–537, 2017.
- Mansoor et al. (2016) A. Mansoor et al. Deep learning guided partitioned shape model for anterior visual pathway segmentation. IEEE transactions on medical imaging, 35(8):1856–1865, 2016.
- Mantha et al. (2023) K.B. Mantha et al. Automated 3d tumor segmentation using temporal cubic patchgan (tcup-gan). In Proceedings of International Challenge on Cross-Modality Domain Adaptation for Medical Image Segmentation, pages 152–164, 2023.
- Menze et al. (2014) B.H. Menze et al. The multimodal brain tumor image segmentation benchmark (brats). IEEE transactions on medical imaging, 34(10):1993–2024, 2014.
- Mistry et al. (2023) S.K. Mistry et al. Multiscale encoder and omni-dimensional dynamic convolution enrichment in nnu-net for brain tumor segmentation. In Proceedings of International Challenge on Cross-Modality Domain Adaptation for Medical Image Segmentation, pages 278–290, 2023.
- Moawad et al. (2023) A.W. Moawad et al. The brain tumor segmentation (brats-mets) challenge 2023: Brain metastasis segmentation on pre-treatment mri. arXiv, 2023.
- Myronenko et al. (2023) A. Myronenko et al. Auto3dseg for brain tumor segmentation from 3d mri in brats 2023 challenge. In Proceedings of MICCAI Brain Tumor Segmentation (BraTS) Challenge, 2023.
- Nalepa et al. (2022) J. Nalepa et al. Segmenting pediatric optic pathway gliomas from mri using deep learning. Computers in Biology and Medicine, 142:105237, 2022.
- Ostrom et al. (2022) Q.T. Ostrom et al. Cbtrus statistical report: pediatric brain tumor foundation childhood and adolescent primary brain and other central nervous system tumors diagnosed in the united states in 2014–2018. Neuro-oncology, 24(Supplement_3):iii1–iii38, 2022.
- Pati et al. (2020) S. Pati et al. The cancer imaging phenomics toolkit (captk): technical overview. In Proceedings of MICCAI Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries Workshop, 2020.
- Pati et al. (2022) S. Pati et al. Federated learning enables big data for rare cancer boundary detection. Nature communications, 13(1):7346, 2022.
- Pati et al. (2023) S. Pati et al. Gandlf: the generally nuanced deep learning framework for scalable end-to-end clinical workflows. Communications Engineering, 2:23, 2023.
- Peng et al. (2022) J. Peng et al. Deep learning-based automatic tumor burden assessment of pediatric high-grade gliomas, medulloblastomas, and other leptomeningeal seeding tumors. Neuro-Oncology, 24(2):289–299, 2022.
- Rathore et al. (2018) S. Rathore et al. Brain cancer imaging phenomics toolkit (brain-captk): an interactive platform for quantitative analysis of glioblastoma. In Proceedings of MICCAI Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries Workshop, 2018.
- Rohlfing et al. (2010) T. Rohlfing et al. The sri24 multichannel atlas of normal adult human brain structure. Human brain mapping, 31(5):798–819, 2010.
- Sulangi et al. (2024) A.J. Sulangi et al. Neuronavigation in glioma resection: current applications, challenges, and clinical outcomes. Frontiers in Surgery, 11:1430567, 2024.
- Temtam et al. (2023) A. Temtam et al. Pediatric brain tumor segmentation using multiresolution fractal deep neural network. In Proceedings of International Challenge on Cross-Modality Domain Adaptation for Medical Image Segmentation, pages 332–340, 2023.
- Thakkar et al. (2014) J.P. Thakkar et al. Epidemiologic and molecular prognostic review of glioblastoma. Cancer Epidemiol Biomarkers Prev, 23(10):1985–1996, 2014.
- Tor-Diez et al. (2020) C. Tor-Diez et al. Unsupervised mri homogenization: application to pediatric anterior visual pathway segmentation. In Proceedings of MICCAI Machine Learning in Medical Imaging (MLMI) Workshop, volume 12436, pages 180–188, 2020.
- Vossough et al. (2022) A. Vossough et al. Training and comparison of nnu-net and deepmedic methods for autosegmentation of pediatric brain tumors. American Journal of Neuroradiology, 2022.
- Yushkevich et al. (2006) P.A. Yushkevich et al. User-guided 3d active contour segmentation of anatomical structures: significantly improved efficiency and reliability. Neuroimage, 31(3):1116–1128, 2006.
- Zhou et al. (2023) Y. Zhou et al. Brain tumor segmentation based on self-supervised pre-training and adaptive region-specific loss. In Proceedings of International Challenge on Cross-Modality Domain Adaptation for Medical Image Segmentation, pages 46–57, 2023.