Glioblastoma brain cancer, CT scan
Credit: DR P. MARAZZI/SCIENCE PHOTO LIBRARY/Getty Images

A team has developed a federated machine learning model for glioblastoma based on brain scan data from patients at over 70 institutions, without compromising patient privacy. The model can improve identification and prediction of boundaries in three brain tumor sub-compartments.

The team, led by researchers at Penn Medicine and Intel Corporation, aggregated brain scan data from 6,314 glioblastoma (GBM) patients at 71 sites around the globe. Their findings were published today in Nature Communications.

Federated learning trains a machine learning algorithm across multiple decentralized devices or servers (in this case, institutions), without actually exchanging the data. It has been previously shown to allow clinicians at institutions in different countries to collaborate on research without sharing any private patient data.

 “Data helps to drive discovery, especially in rare cancers where available data can be scarce. The federated approach we outline allows for access to maximal data while lowering institutional burdens to data sharing,” said Jill Barnholtz-Sloan, PhD, one of the authors and adjunct Professor at Case Western Reserve University School of Medicine.

Glioblastoma is a type of brain cancer that though rare, is very aggressive and accounts for almost half of all brain cancer cases.  It’s estimated that more than 13,000 Americans will develop this condition in 2022. The disease is resistant to radiotherapy and chemotherapy, but may respond to personalized treatment, which makes it essential to have data-fueled research.

“This is the single largest and most diverse dataset of glioblastoma patients ever considered in the literature, and was made possible through federated learning,” said senior author Spyridon Bakas, PhD, an assistant professor of Pathology & Laboratory Medicine, and Radiology, at the Perelman School of Medicine at the University of Pennsylvania. “The more data we can feed into machine learning models, the more accurate they become, which in turn can improve our ability to understand, treat, and remove glioblastoma in patients with more precision.”

Researchers studying rare conditions, like GBM, are often limited to studying data from patients at their own institutions. Due to privacy protection legislation, such as the Health Insurance Portability and Accountability Act of 1996 (HIPAA) in the United States, and General Data Protection Regulation (GDPR) in Europe, data sharing collaborations across institutions without compromising patient privacy data is a major obstacle for many healthcare providers.

A Staged Process

The team’s model followed a staged approach. The first stage, called a “public initial model,” was pre-trained using publicly available data from the International Brain Tumor Segmentation (BraTS) challenge. The model was interrogated to find the boundaries of three GBM tumor sub-compartments: “enhancing tumor” (ET), representing the vascular blood-brain barrier breakdown within the tumor; the “tumor core” (TC), which includes the ET and the part which kills tissue, and represents the part of the tumor relevant for surgeons who remove them; and the “whole tumor” (WT), which is defined by the union of the TC and the infiltrated tissue, which is the whole area that would be treated with radiation.

This first the data of 231 patient cases from 16 sites, and the resulting model was validated against the local data at each site. The second stage, called the “preliminary consensus model,” used the public initial model and incorporated more data from 2,471 patient cases from 35 sites, which improved its accuracy. The final stage, or “final consensus model,” incorporated the largest amount of data from 6,314 patient cases (3,914,680 images) at 71 sites, across 6 continents, to further optimize and test for generalizability.

Following model training the final consensus model showed significant performance improvements against the collaborators’ local validation data. This model had an improvement of 27% in detecting ET boundaries, 33% in detecting TC boundaries, and 16% for WT boundary detection.

Also of Interest