Image of gut and intestines showing colorectal cancer highlighted in red.
Credit: Raycat/Getty Images

A “local” artificial intelligence framework provides more personalized information about how the microbiome influences risk of colorectal cancer, according to a research team from the Tokyo Institute of Technology. Their work separates patients into four distinct subgroups.

“Local explanation techniques make it possible to discover the most contributing bacteria for each individual CRC [colorectal cancer] patient, enabling us to examine inter-individual differences between subjects within a disease group,” explains associate professor Takuji Yamada, the senior author of the study.

Their paper was recently published in Genome Biology.

The gut microbiome comprises many different bacterial species that are essential to human health. In recent years, scientists across several fields have found that changes in the gut microbiome can be linked to a wide variety of diseases, including colorectal cancer. Higher abundance of certain bacteria, such as Fusobacterium nucleatum and Parvimonas micra, is typically associated with colorectal cancer progression.

Colorectal cancer is the third most common malignancy and the second most deadly. There are an estimated 1.9 million cases annually and it was the cause of an estimated 0.9 million deaths worldwide in 2020.

Various artificial intelligence models have been developed to help determine which bacterial species are useful as colorectal cancer biomarkers. However, the researchers say, most of these models rely on what is known as “global explanations,” meaning that they can only consider the entirety of the input data to make predictions. These models can’t identify bacterial species that could be relevant biomarkers for smaller, less-representative groups of patients.

This team used a framework called “Shapley additive explanations” (SHAP), which originated from a concept in game theory called the Shapley value. The Shapley value tells how a payout should be distributed among the players of a coalition or group. Similarly, in their study, the team used SHAP to calculate the contribution of each bacterial species to each individual CRC prediction.

Using this approach along with data from five publicly available datasets, the researchers discovered that projecting the SHAP values into a two-dimensional (2D) space allowed them to see a clear separation between healthy and colorectal cancer subjects.

Clustering this 2D information resulted in four distinct subgroups, each differing in the colorectal cancer probability and the associated bacteria. In addition, the team found that subjects in the subgroups with the highest cancer risk all had an enriched population of bacteria typically associated with the disease. Most remarkably, the results were consistent across the five datasets, showcasing the wide applicability of this method.

“Considering the increasing use of machine learning in microbiome–disease association studies, our novel method could be beneficial for a more personalized microbiome data exploration as well as help uncover potential disease subgroups along with their potential associated biomarkers,” said Yamada.

Further, the technique is also applicable to other diseases with known links to the gut microbiome, such as ulcerative colitis, Crohn’s disease, and diabetes.

Also of Interest