SEQUOIA tool to predict gene expression in tumor biopsy slides
A new AI program, SEQUOIA, can analyze a microscopy image from a tumor biopsy (left, purple) and rapidly determine what genes are likely turned on and off in the cells it contains (gene expression shown in shades of red and blue on right). Credit: Emily Moskal/Stanford Medicine.

Stanford Medicine researchers have used artificial intelligence (AI) to create a tool that can predict gene expression levels within tumor cells using just standard microscopy images of the tumor biopsy.

The tool, named SEQUOIA (short for slide-based expression quantification using linearized attention), was created using data from 7,584 tumor samples across 16 cancer types and validated in two independent cohorts. The team reports in Nature Communications that it can accurately predict the expression levels of genes involved with key cancer processes, including inflammatory response, cell cycles, and metabolism, and can stratify patients with breast cancer by risk.

“With the growing interest in precision medicine, molecular profiling has gained significant attention as a critical component of prognostication and treatment planning,” write Stanford graduate student Marija Pizurica and co-authors.

They explain that although the current methods for measuring gene expression have “deepened our understanding of cancer heterogeneity, leading to the discovery of molecular signatures associated with treatment sensitivity,” they are costly and time consuming.

A more cost-effective method may be to take advantage of the information already available on digitalized histopathology whole-slide images, the researchers suggest.

“Previous work has shown that digital pathology images of tissues are correlated with gene RNA variations,” senior author Olivier Gevaert, PhD, a professor of biomedical data science, told Inside Precision Medicine. “That inspired us to develop an AI model that uses the most recent technological innovations to see if we can exploit this correlation further and develop a model that predicts all genes across all tissues in the human body.”

After the researchers integrated the 7,584 cancer biopsies as well as other datasets, including transcriptomic data and images from thousands of healthy cells, into SEOQUIA, they found that the AI program accurately predicted an average of 15,344 (74%) of the 20,820 genes within the biopsy images across the 16 cancer types.

Furthermore, the number of well-predicted genes was positively correlated with the number of training samples available in each cancer. The highest number of well-predicted genes was identified in breast cancer (n=18,878), the cancer type with the most available slides (n=1,130). This was followed by 18,758 well-predicted genes in thyroid cancer (n=517 slides) and 17,623 genes in kidney cancer (n=514 slides).

“Even though previous work showed there was a signal, the final model works much better than we expected,” said Gevaert. “In particular, for breast cancer where more than 18,000 genes can be predicted accurately. We definitely did not expect that it would work this well, and we are continuing our work on this project, so more improvements are possible.”

To test the utility of SEQUOIA for clinical decision making, Gevaert and colleagues identified a gene expression signature comprising 272 genes that are significantly associated with recurrence. Gene signatures like this are already used in commercial breast cancer genomic tests such as the FDA-approved MammaPrint test, which analyzes the levels of 70 breast-cancer-related genes to provide patients with a score to determine their risk for cancer recurrence.

When patients with breast cancer from three independent cohorts were stratified into high- and low-risk subgroups according to the SEOQUIA risk score, the researchers found that those assigned with high-risk scores had significantly shorter recurrence-free survival than patients with low-risk scores.

To make the data accessible and easy to interpret, the researchers programmed SEQUOIA to display the genetic findings as a visual map of the tumor biopsy, letting scientists and clinicians see how genetic variations might be distinct in different areas of a tumor.

There are other tools that can visualize gene expression in biopsy slides, but Gevaert said that “SEQUOIA performs significantly better than previous work and has been applied to the largest possible tissues and validation data sets.”

He added: “The main difference is that we have integrated digital pathology foundation models in SEQUOIA. These are large models that have been trained on millions of tissue images similar to popular models such as GPT, LLAMA, and Gemini.”

The AI model can’t currently be used in a clinical setting as it does not have FDA approval. “The next step is to run studies that deploy SEQUOIA in clinical workflows and determine at what stage it benefits physicians the most,” noted Gevaert. “The likely first use case is to help pathologists read these images and provide additional information based on the digital images produced by SEQUOIA on relevant gene activity and gene signatures. This can help with diagnosis and provide also input to molecular tumor boards in a timely fashion to help determine treatment.”

He stressed that the tool is not limited to breast cancer. “With our model, we can predict any gene signature across any cancer type with little additional cost aside from the cost of running the model.”

Also of Interest