Microsoft and Paige have revealed the fruits of their collaborative labor in creating a widely applicable AI-powered cancer pathology tool.
A research article published in Nature Medicine demonstrates a model that can recognize patterns in diverse pathology images due to being trained on over a million digitized slides. This model, Virchow (named after the founder of cellular pathology, Rudolf Virchow), was then utilized to drive downstream cancer detection applications. The Virchow-based tools performed on par with and, in some cases, better than tissue-specific clinical-grade models at finding some rare cancer variants. These results show that Virchow could be useful in various digital pathology settings that are usually limited by a lack of labeled training data.
Virchow, a new “foundation” for cancer diagnostics
Digital histological preparations, also known as whole slide images (WSIs), are gradually replacing their analog counterparts in light microscopy examinations, as they are the basis of efforts in computational pathology to aid in disease diagnosis, characterization, and understanding through the application of artificial intelligence. In this developing field—the first AI pathology system to receive FDA approval was in September 2021—a major aim is to decipher routine WSIs for previously unknown outcomes like prognosis and therapeutic response. These capabilities are possible due to computer vision’s remarkable performance gains and the development of foundation models—self-supervised algorithms built on massive unlabeled datasets that enable widespread applicability.
Last year, in early September, Microsoft and Paige, a provider of AI-driven pathology solutions dedicated to improving cancer research and treatment, partnered to develop a foundation model for clinical-grade computational pathology and identify rare cancers. The resulting AI model, Virchow, was trained using data from around 100,000 patients, equivalent to about 1.5 million H&E-stained WSIs obtained from Memorial Sloan Kettering Cancer Center (MSKCC). This dataset used four- to ten-fold more images and 3,000 times more pixels than commercially available AI models to generate data representations, called embeddings, that can generalize well to diverse predictive tasks.
Virchow was tested against various clinical-grade AI models, initially previewed as a pre-print in January of this year. According to the Nature Medicine research article, Virchow’s performance generally matches these commercial models in pan-cancer detection and outperforms them in detecting rare cancers. This outcome is even more astounding when considering that the pan-cancer model’s training dataset does not include the usual quality control and subpopulation enrichment of data and labels that are done for commercially available AI models.
Overall, Virchow unlocks the ability to accurately and precisely detect unusual histological variants of cancer and biomarker status, which is difficult to achieve with cancer- or biomarker-specific training due to the limited amount of associated training data. The results provide evidence that large-scale foundation models can be the basis for robust results in a new frontier of computational pathology.