One of the first things we learn when studying biology is that amino acids are building blocks of the cell. These simple organic compounds join together to form proteins that are essential for the regulation of the body’s tissues and organs.
Proteomics–the large-scale study of proteins–has been trying to understand how the numbers, sequences, combinations, and locations of these proteins within each cell impact biologic functions and disease pathology for many years. Different research groups each take their own approach; some want to know the whole amino acid sequence and some may be looking at protein identification or fingerprinting with a partial signature, while others are interested in counting the numbers of each type of protein in the cells.
Following the more mainstream adoption of genomics, interest in proteomics, and the technologic advances that accompany that, has grown rapidly. Speaking to Inside Precision Medicine, Nikolai Slavov, PhD, Allen Distinguished Investigator and director of the single-cell proteomics center at Northeastern University in Boston, Massachusetts, pointed out that “in the year 2001, there was a seminal article published in Nature Biotechnology1 by John Yates where they were able to analyze a thousand proteins using over the course of a day or two. Now, we can analyze many thousands of proteins in just a couple of minutes.” Slavov described the growth over the last 5–10 years as “huge” with a lot of the most recent developments improving experimental design and data interpretation.
Although genomics has now been used to solve the human genome, the RNA abundance does not always, or even often, correlate with the abundance of the proteins that are translated from these RNAs by ribosomes in the cells. This is important because if you are looking for a novel protein drug target on a tumor, for example, the abundance of the proteins and RNA transcript data alone cannot provide that information. Proteomics can also tell you about cellular localization, numerous posttranslational modifications, and how a cell reacts to a particular challenge such as viral infection over time whereas genomics only provides information on how the cells could react to that challenge.
Mass spectrometry–the gold standard of proteomics
Proteins can be measured in a number of ways in biologic samples, but the current state of the art and gold standard is mass spectrometry. “Essentially mass spectrometry is a technique which is based on measuring masses of molecules, which are converted into charged particles,” explained Alexander Makarov, PhD, director global research LSMS at Thermo Fisher Scientific. Physicist Makarov is credited with the invention of the Orbtrap analyzer, which, following its initial presentation in 1999, became a game changer for biomolecular mass spectrometry–first in the area of mass spectrometry-based proteomics and then in the structural analysis of small molecules and the fight against doping2.
The technique typically takes a “bottom up” approach when used for proteomics, said Javier Alfaro, PhD principal investigator in computational immunology at the University of Gdansk in Poland, and research fellow at the University of Edinburgh in the U.K. This means proteins are first cut down, by enzymes, in to smaller peptides which are then separated by liquid chromatography. The enzymes used to cleave the proteins do so at sites that are known to produce peptides likely to have a positive charge. The separated peptides are transferred to the gas phase using electrospray ionization during which they enter the mass spectrometer. They are then separated by mass-to-charge ratios commonly with an Orbitrap ion trap mass analyzer, which measures ion oscillation frequency, or by time-of-flight technologies that determine the time it takes for an ion to reach the detector. In both cases, the mass spectra of the ions are acquired and used to infer the peptides and then proteins present.
Much of the proteomics work carried out to date has been done using bulk samples consisting of a heterogenous mixture of cell types that are often at different stages of differentiation. Although this approach can give information about the average state of the biologic system being studied, it may also obscure variability between cells and produce average readings that do not represent any of the individual cells in the system3.
Drilling down to single cells
Single-cell analysis on the other hand allows researchers to understand protein relationships specific to each of the cell types making up a complex tissue such as a tumor sample and can highlight the full dynamic range of variation across cells4.
“It may be that there’s a critical but small group of 100 cells that drive the clinical phenotype,” said Parag Mallick founder of Nautilus Biotechnology and associate professor at Stanford University. “You see that a lot with cancer – You have a rare drug resistant cell population that is 0.1 percent of the tumor and if you study the tumor en masse you likely aren’t able to know they are present.”
Traditional single-cell protein analysis methods, such as mass cytometry, cellular indexing of transcriptomes and epitopes by sequencing, protein sequencing via RNA expression, and CO-Detection by indEXing, rely on antibodies barcoded with DNA sequences, fluorophores, or transition metals to detect protein epitopes5. However, these methods have poor specificity and limited throughput, which makes understanding the interactions and functions of proteins at single-cell resolution a challenge5.
New technologies for analyzing single cells by mass spectrometry are aiming to overcome these challenges without the use of antibodies. Slavov’s lab has led the way with the development of the Single Cell ProtEomics by Mass Spectrometry (SCoPE-MS) method and its second generation, SCoPE2.
SCoPE-MS resolved two major challenges of single cell protein analysis by mass spectrometry: 1) delivering the proteome of a mammalian cell to a mass spectrometry instrument with minimal protein losses and 2) simultaneously identifying and quantifying peptides from single-cell samples6. Slavov and team solved the first problem by substituting the initial chemical lysis of cells and liquid chromatography peptide separation steps with manual selection of individual cells under the microscope and then mechanical lysis via sonication6. This meant that proteins were not lost during the post chemical lysis clean-up process or via adherence to the large surface area of liquid chromatography columns, which are steps typically necessary prior to mass spectrometry.
They used tandem mass tags combined with tagging of carrier cells, to overcome the second problem. The TMTs are chemical labels that covalently bind to peptides. Different cell types are given different tags and mixed with around 200 tagged carrier cells that also help to reduce losses due to adhesion during the liquid chromatography based separation of peptides that is required prior to MS. This multiplexed method was originally used to quantify over a thousand proteins in differentiating mouse embryonic stem cells6.
SCoPE2 improved on the initial protocol by introducing automated and miniaturized sample preparation, which substantially increased quantitative accuracy and throughput while lowering cost and hands-on time7. Slavov and team also developed methods for optimizing the acquisition of mass spectrometry data and for interpreting these data once acquired. These modifications enabled the quantification of approximately 1000 proteins per single cell and over 3000 proteins across many cells in around 90 minutes of analysis time7. These initial methods paved the way for a next generation framework, plexDIA, that enables multiplexing of both peptides and samples8. This parallel process results in a multiplicative throughput increase that will drive further scaling up of single-cell proteomics9, 10.
Another key innovation that has advanced the field is the development of the NanoPOTS (nanodroplet processing in one pot for trace samples) platform by Ryan Kelly, from the Pacific Northwest National Laboratory in Richland, Washington, and colleagues. NanoPOTS addresses the issues of miniaturizing protein digestion and clean-up by reducing sample processing volumes to less than 200 nL, which reduces protein losses due to nonspecific surface adsorption11. When combined with ultrasensitive liquid chromatography-mass spectrometry, nanoPOTS allows identification of around 1500–3000 proteins from approximately 10–140 cells, respectively11.
There are many more pioneers, including Erwin Schoof and Karl Mechtler, who are working hard to develop new protocols and technology that will improve the sensitivity and throughput of mass spectrometry for single-cell proteome analysis. The next step is to apply these methods to translational medicine.
“I think mass spectrometry is well positioned for translational medicine but of course we have other technologies coming out for example, affinity-based methods from SomaLogic or Olink,” said Andreas Huhmer, PhD, senior director Omics Technology business development at Thermo Fisher Scientific.
Olink’s Proximity Extension Assay technology merges an antibody-based immunoassay with the properties of polymerase chain reaction (PCR) to enable high-throughput protein detection, while SomaLogic report that they can simultaneously measure 7000 proteins per sample and over 1000 clinical samples per day using their aptamer-based SomaScan platform. The technical performance of the current plasma 7K SomaScan assay was recently assessed by Julián Candia and colleagues at the National Institutes of Health12. They concluded that the assay offers “tremendously promising opportunities […] with its increasingly expanding protein coverage and consistently low variability.” They also comment on the assay’s “remarkable sensitivity” but note that “more work is needed to fully address background noise, limits of detection, specificity, cross-reactivity, and orthogonal reproducibility.”
In addition, research13–15 has shown that SomaScan does not always correlate with commonly used laboratory measures. For example, Tariq Faquih and co-workers from Leiden University Medical Center in the Netherlands showed that SomaScan measurement had poor agreement with the standard laboratory measurements, While Caroline Lopez-Silva (Johns Hopkins University School of Medicine, Baltimore, Maryland) et al found that the technique correlated poorly with gold-standard immunoassays for five of nine biomarkers for chronic kidney disease.”
Another technique that should be highlighted when discussing single cell proteomics is CyTOF–short for cytometry by time of flight, said Michael MacCoss, PhD, Professor of Genome Sciences at the University of Washington in Atlanta. This mass spectrometry-flow cytometry hybrid device has been pioneered for use in single cell proteomics by Gary Nolan’s lab at Stanford University. The method allows multiplexing of flow cytometry so that 40 proteins can be measured at once. Its biggest application so far has been looking at the heterogeneity of immune cells in the blood.
Nolan and colleagues have also developed a method called multiplexed ion beam imaging (MIBI) that can analyze up to 100 targets simultaneously over a five-log dynamic range in a similar way to CyTOF, but in addition to measuring protein levels on individual cells, it also provides the information about cell morphology and localization15. Both CyTOF and MIBI rely on antibodies to label specific proteins and are therefore subject to the limitations discussed previously.
Single cell proteogenomics
The improvements in sensitivity and throughput of single-cell proteomics techniques have meant that researchers are now beginning to combine their findings with genomics data to further understand what is happening the cells.
“I think the biggest impact [of single cell proteogenomics] is going to be identifying mechanisms of disease and identifying drug targets,” said Slavov. “Potentially identifying biomarkers for stratification that then can then be measured with cheaper assays in the clinic.”
He gave one example of how proteogenomics might identify mechanisms of post-transcriptional regulation and systematic differences between RNA and protein abundances and how they might contribute to cancer is the tumor suppressor protein p53. This protein “is regulated almost entirely by degradation and we and other colleagues have repeatedly found that its protein abundance, modifications, activity are almost completely decoupled, unrelated to the transcript level,” Slavov said.
Mallick also noted another example of proteogenomic regulation: “the hypoxia-inducible factor’s (HIF1a) transcript is constitutively produced in cells. However, typically HIF1a protein abundance is low because in normal oxygen conditions the protein is degraded as quickly as it is manufactured. The moment when there is not enough oxygen, protein degradation is halted, the protein abundance skyrockets and the hypoxia response is triggered.”
Proteogenomics has also been used to study cancer heterogeneity. A multiomic analysis by Sascha Dietrich, from the University of Heidelberg in Germany, and colleagues refined the classification of chronic lymphocytic leukemia16, while Matthew Ellis, from the Baylor College of Medicine in Houston, Texas, and co-investigators have used proteogenomics to identify markers of chemotherapy resistance in triple-negative breast cancer17.
Alfaro and colleagues are employing proteogenomics analyses to develop cancer vaccines. “In the context of cancer vaccines, now we can go in and we can understand what is the landscape of mutations that are presented to the immune system. This is important because it allows us to target and understand what are recurring neoantigens that are that are being presented to the immune system,” he said. “We would love to be able to do this in single-cells but the technology is just not there to capture this heterogeneity just yet.”
However, Slavov urges caution “from thinking that we are going to have transformative results overnight.” He sees single-cell proteomic and proteogenomics as “the right approach for the future as a sustained long-term effort that results in steady marching of science and steady progress but not overnight success.”
Limitations of single cell mass spectrometry
Proteomics today has not yet been able to identify and fully sequence all the proteins in a human tissue sample, let alone single cells, remarks Alfaro. The proteome is challenging for many reasons and indeed, current mass-spectrometry techniques still have many drawbacks when it comes to analyzing complex protein mixtures, even those from single cells. The main issue facing proteomics is the dynamic range problem – some proteins are present in only a few copies per cell and some are present in tens of millions of copies per mammalian cell, write MacCoss, Alfaro, Slavov and colleagues in their preprint looking at emerging single molecule and mass spectrometry methods for sampling the proteome18.
Dealing with this extreme dynamic range involves finding ways to count incredible numbers of peptides. The mass spectrometry community has made choices along the way (e.g. in enzyme choice for bottom-up techniques and the decision to manipulate ions in the gas phase) to generate many ‘like’ peptide ions that can be sorted before counting. The paradigm is to sort peptide ions, sequence a few and simply count the rest of these like-ions. Using this strategy, mass spectrometry can dive into the dynamic range of the cell more effectively. Nevertheless, all of these choices create sequencing bias. “Peptides need to be able to move into the gas phase and be manipulated in the gas phase, for example. Not all sequences can do this. So, all of our sample preparation methods in proteomics for mass spectrometry are around producing ‘like’ ions that that can be effectively sorted, moved into the gas phase and identified, and that has an impact on how much of the proteome we are able to actually ‘sequence’,” Alfaro remarked.
Makarov believes there is still “a lot to gain” in terms of sensitivity at most of the stages of mass spectrometry. “So, starting from ionization efficiency that we don’t want to leave any peptides behind just because they’re poorly ionizable, and then through transport of these ions from atmosphere to vacuum, and then utilization of these ions inside vacuum.” He says that there is still room for “orders of magnitude” improvement in sensitivity by improving utilization of the ions inside the vacuum “which hopefully will be realized within the nearest several years.”
This should then make much higher throughput analysis possible enabling hundreds of single cell analyses per day. Makarov thinks that if these improvements are “accompanied by improvements in labeling chemistry when we do barcoding with tandem mass tags or similar then we should be able to do it at not 18 plex like we do now but probably more like to 100 Plex, which will bring us into thousands of single cells per day and the depth of analysis should also improve.”
With these improvements, Makarov says it should be possible to identify the top five to seven thousand most abundant proteins per cell within this decade.
Single Molecule Protein Sequencing
In addition to advances in single-cell mass spectrometry proteomics we now “have a whole swath of emerging single molecule protein sequencing technologies, which declare huge promises,” noted Makarov. He pointed out that together the start-ups in this fields have “collected probably more investment than mass spectrometry instrument development has received over the entire 100 years of its existence.”
“Single molecule sequencing will provide the ultimate sensitivity,” said Chirlmin Joo, PhD, professor of single molecule biophysics at Delft University of Technology in the Netherlands. “You will be identifying every protein whether it’s from a single cell or from a mixture of cells or tissues. Yet it remains to be seen whether it can reach single-cell analysis anytime soon, due to the complexity in the sample handling and its yet-lower dynamic range than that of the mass spectrometry.”
In line with this, Alfaro pointed out that “any technology that focuses on counting one molecule at a time is randomly sampling either peptides or proteins from their mixture meaning you’re always going to end up sampling the thing that is most abundant, especially in the context of dynamic range.” That being said, he believes it’s of significant promise that the single-molecule protein sequencing community is also thinking about protein separation, noting that “the same paradigm of sorting ‘like’ molecules and sequencing a few from each group that has made mass spectrometry so successful, could eventually be pushed to the limit of sensitivity by these single-molecule technologies.”
Joo added: “It is an emerging field with many completely different ideas from very diverse disciplines including single-molecule fluorescence, nanopore detection, DNA technologies, super-resolution imaging, NEMS (Nanoelectromechanical systems), quantum tunneling, force spectroscopy, and plasmon resonance, if I name only a few. There has been a fast transfer of knowledge from organic chemistry, protein chemistry, biochemistry, bioinformatics, computational sciences, clinical chemistry, as well as mass spectrometry.”
Joo and colleagues are using single-molecule fluorescence techniques combined with DNA technologies to carry out full length protein analyses. These methods should be able to give almost real-time monitoring of abnormal protein expressions in a patient sample, said Joo, but he stresses that while single molecule protein sequencing is “an emerging field with great expectations, proteins are complex, and it is therefore important to be patient.”
“I get often asked by others including investors which method will be the final winner,” Joo said. His answer: “I do not know, because this is an extremely high-risk, high-gain field with many unknowns.”
Nautilius Biotechnology are taking a different approach that will initially be used on bulk samples but is potentially applicable to single-cell studies. Mallick points out that when analyzing low abundance proteins in small numbers you do not have the amplification capabilities that PCR has for DNA. You therefore need “a technique that is sensitive enough to measure handfuls of molecules. You cannot get more sensitive than a single molecule measurement.”
The Nautilus proteome analysis platform is designed to overcome some of the coverage, sensitivity and dynamic range challenges of mass spectrometry-based approaches. In the approach, sample proteins are immobilized onto a nanofabricated, large-scale, single-molecule protein array. Next, the proteins are iteratively interrogated with a series of fluorescently labeled, non-traditional affinity reagents that bind to short epitopes, three to four amino acids long. Data from this iterative probing is interpreted via a machine learning analysis to potentially identify and quantify the proteome with extreme sensitivity and scale. The array is designed to measure billions of individual protein molecules simultaneously. “We wanted to capture every protein from the sample and glue it down,” said Mallick.
Nanopore methods are also being used to look at single molecule protein sequencing, either by reading the amino acid sequence of linearized peptide, fingerprinting linearized proteins, or characterization and identification of folded proteins19. Nanostring, Northern Nanopore Instruments, Oxford Nanopore Technologies, and several other companies are all key players in that field.
Outside of nanopores, Erisyon have built a single molecule protein sequencer based on fluorosequencing. Another company to watch is QuantumSi, which has built a single molecule detection platform using their proprietary Time Domain Sequencing technology.
One thing that all companies developing new proteomics techniques, be that single cell or single molecule, need to consider if they want to have a clinical impact on precision medicine is that the outcomes must be reproducible for decades to come and that requires a lot of infrastructure, said Huhmer. “Thermo Fisher Scientific is in a position to develop all these tools; we make calibrators and calibrants, reference standards etc for many clinical analyzers. So obviously, these are all areas that will play a vital role in how we can implement this technology in the clinic.”
In addition, Slavov and other experts in the field of mass spectrometry, including MacCoss, have written a white paper due to be published in Nature Methods that provides initial recommendations for performing, benchmarking, and reporting single-cell proteomics experiments20. “I think that’s going to be an inflection point for the field because it establishes beyond a doubt the robustness of the methods and makes recommendations for how to carry out these measurements and experiments. I think it’s going to be very helpful for the field to build on solid foundations,” he remarked.
Makarov elegantly summarizes the current landscape of proteomics by saying he thinks that “although the future of mass spectrometry is crowded with competition it still could be bright if we correctly develop and do it at speed and scale as we move forward.” Alfaro comments that “so long as emerging technologies focus on sampling different parts of the proteome or on having different sequencing biases, new technologies stand to complement the state of the art.”
References
1. Washburn MP, Wolters D, Yates III JR. Large-scale analysis of the yeast proteome by multidimensional protein identification technology. Nature Biotechnol 2001; 19: 242–247.
2. Makarov A. Orbitrap journey: taming the ion rings. Nature Communications 2019; 10: 3743.
3. Thermo Fisher Scientific. Challenges and emerging directions in single-cell proteomics. Will it go mainstream like genomics? White Paper 65730. [last accessed 23rd November 2022]
4. Slavov N. Learning from natural variation across the proteomes of single cells. PLoS Biol 2022; 20: e3001512.
5 Slavov N. Unpicking the proteome in single cells. Science 2020; 367; 512–513.
6. Budnik B, Levy E, Harmange G, Slavov N. SCoPE-MS: mass spectrometry of single mammalian cells quantifies proteome heterogeneity during cell differentiation. Genome Biol 2018; 19: 161.
7. Specht H, Emmott E, Petelski A, et al. Single-cell proteomic and transcriptomic analysis of macrophage heterogeneity using SCoPE2. Genome Biol 2021; 22: 50.
8 Derks J, Leduc A, Wallmann G, et al. Increasing the throughput of sensitive proteomics by plexDIA. Nature Biotechnol 2022.
9. Framework for multiplicative scaling of single-cell proteomics. Nature Biotechnol 2022.
10. Slavov N. Scaling Up Single-Cell Proteomics. Mol Cell Proteom 2021; 2: 100179.
11. Zhu Y, Piehowski PD, Zhao R, et al. Nanodroplet processing platform for deep and quantitative proteome profiling of 10–100 mammalian cells. Nature Commun 2018; 9: 882.
12. Faquih T, Mook-Kanamori DO, Rosendaal FR, et al Agreement of aptamer proteomics with standard methods for measuring venous thrombosis biomarkers. Res Pract Thromb Haemost 2021; 5: e12526.
13. Candia J, Daya GN, Tanaka T, et al. Assessment of variability in the plasma 7k SomaScan proteomics assay. Scientific Reports 2022; 12: 17147
14. Lopez Silva C, Surapaneni A, Coresh J, et al. Comparison of Aptamer-Based and
Antibody-Based Assays for Protein Quantification in Chronic Kidney Disease. Clin J Am Soc Nephrol 2022; 17: 350–360.
15. Pietzner M, Wheeler E, Carrasco-Zanini J, et al. Synergistic insights into human health from aptamer- and antibody-based proteomic profiling. Nature Comm 2021; 12: 6822.
16. web.stanford.edu/group/nolan/technologies.html [last accessed 23rd November 2022].
17. Herbst SA, Vesterlund M, Helmboldt AJ, et al. Proteogenomics refines the molecular classification of chronic lymphocytic leukemia. Nature Comm 2022; 13: 6226.
18. Anurag M, Jaehnig EJ, Krug K, et al. Proteogenic markers of chemotherapy resistance and response in triple-negative breast cancer. Cancer Discov 2022; 12: 2586–2605.
19. MacCoss M, Alfaro J, Wanunu M, et al. Sampling the proteome by emerging
single-molecule and mass-spectrometry methods. [Preprint last accessed 23rd November 2022].
20. Alfaro JA, Bohlander P, Dai M, et al. The emerging landscape of single-molecule protein sequencing technologies. Nature Methods 2021; 18: 604–617.
21. Gatto L, Aebersold R, Cox J, et al. Initial recommendations for performing, bench marking, and reporting single-cell proteomics experiments. [Preprint last accessed 23rd November 2022].
Laura Cowen is a freelance medical journalist who has been covering healthcare news for over 10 years. Her main specialties are oncology and diabetes, but she has written about subjects ranging from cardiology to ophthalmology and is particularly interested in infectious diseases and public health.