Researchers at the University of California San Diego (UCSD) School of Medicine and Rady Children’s Institute for Genomic Medicine have created a deep learning tool that uncovers disease-causing mosaic mutations, a first step they say to find ways to develop treatments for many diseases.
Mosaic mutations are only present in a very small percentage of human cells and, for this reason, they are difficult to detect using standard DNA sequencing and computational approaches. To overcome this, the investigators turning to deep learning to develop a tool that can learn from—and examine—vast troves of genetic data to finding the tiny percentage of cells affected by mutations. Performance of the tool, called “DeepMosaic” was detailed in research published yesterday in the journal Nature Biotechnology.
DeepMosaic was trained was trained on 180,000 simulated or experimentally assessed MVs, and was benchmarked on 619,740 simulated mosaic variants (MVs) and 530 independent biologically tested MVs from 16 genomes and 181 exomes. The tool functions in ways similar to human visual processing, with much greater accuracy and attention to detail and improved computational methods of non-cancer MV detection that are often left undetected.
“One example of an unsolved disorder is focal epilepsy,” said Joseph Gleeson, MD, Rady professor of Neuroscience at UC San Diego School of Medicine and director of neuroscience research at the Rady Children’s Institute for Genomic Medicine and senior author of the paper. “Epilepsy affects 4% of the population, and about one-quarter of focal seizures fail to respond to common medication. These patients often require surgical excision of the short-circuited focal part of the brain to stop seizures. Among these patients, mosaic mutations within the brain can cause epileptic focus.
“We have had many epilepsy patients where we were not able to spot the cause, but once we applied our method, called ‘DeepMosaic,’ to the genomic data, the mutation became obvious. This has allowed us to improve the sensitivity of DNA sequencing in certain forms of epilepsy and had led to discoveries that point to new ways to treat brain disease.”
DeepMosaic’s training involved using information that included known MVs as well as many normal DNA sequences and allowed the machine-learning technology to learn the differences between them. Development of the tool involved an iterative process of continually re-training it with increasingly complex datasets and selection between a dozen of models, the computer was eventually able to identify mosaic mutations much better than human eyes and prior methods. It was also tested on large independent data sets not used in the training and out preformed prior analysis and detection methods.
In their research, the team showed that DeepMosaic performed with a sensitivity of 0.78, specificity of 0.83 and positive predictive value of 0.96 on noncancer whole-genome sequencing data, as well as doubling the validation rate over previous best-practice methods on noncancer whole-exome sequencing data (0.43 versus 0.18).
“DeepMosaic surpassed traditional tools in detecting mosaicism from genomic and exonic sequences,” said Xin Xu, a former undergraduate research assistant at UC San Diego School of Medicine and co-first author. “The prominent visual features picked up by the deep learning models are very similar to what experts are focusing on when manually examining variants.”
The team noted that DeepMosaic is an accurate MV classifier for noncancer samples that can be implemented as an alternative or complement to existing methods. To advance adoption of the tool and spur further researcher, the UCSD and Rady team have made DeepMosaic freely available via an open-source platform that can enable other researchers to train their own neural networks to achieve a more targeted detection of mutations using a similar image-based method.