Modern biology has a problem - how to find meaning in the rising oceans of genomic data, such as the reams of cancer mutations that genome-wide studies are publishing every week. The challenge is finding efficient ways to parse the signals from the noise.
There are efforts to fuse statistical mechanics and a learning algorithm into a mathematical toolkit that can turn cancer-mutation data into multidimensional models that show how specific mutations alter the social networks of proteins in cells. From this, biologists can deduce which mutations among the myriad mutations present in cancer cells might actually play a role in driving disease.
Statistical mechanics describes large phenomena by predicting the macroscopic properties of microscopic components.
"Here we have found that a fundamental concept in statistical mechanics, which many of us learned as undergraduates in theoretical physics courses and then largely forgot because it didn't apply to our everyday lives as biologists, can be relevant to one of the most difficult problems in cancer genetics," said Peter Sorger, a professor of Systems Pharmacology at Harvard and senior author of a paper in Nature Genetics.
Dark Matter Matters
Many of the most widely studied cancer genes, such as P53 and Ras, were discovered after decades of work by many groups. But today, in the era of high throughput genomics, we have thousands of times more data from thousands of samples. As a result, the sheer volume of catalogued cancer mutations is vast. But not all mutations actually influence tumor behavior. Many appear to be along for the ride, so to speak, and are as a result called "passenger mutations."
In order to separate the drivers from the passengers, researchers typically use a kind of "polling" strategy in which they identify the most common mutations, reasoning that those are the significant ones. Only the most promising candidates are then subjected to the detailed and painstaking analysis that has been applied to P53 and Ras.
Mohammed AlQuraishi, an independent HMS Systems Biology fellow associated with the LSP and Sorger lab and lead author of the paper, reasoned that biologists were in dire need of much more biophysically rigorous tools for scouring this data. With a background in genetics, statistics and physics, AlQuraishi realized that biologists can exploit the statistical power from live data sets and marry it to theoretical physics. "It's the way that Silver and Feynman together would do it," he joked.
Statistical mechanics is a precise physical description of how collections of individual molecules give rise to the macroscopic properties we perceive, such as temperature and pressure. AlQuraishi used its core principles as the basis for a platform that would analyze information housed in the Cancer Genome Atlas. As a result he was able to generate detailed schematics of how certain mutations altered the vast, complex cellular world of protein social networks—networks that largely determine a cell's health, or lack thereof. In doing so, he stumbled upon a few unexpected findings.
Again, many cancer mutations are common, and many more cancer mutations are rare—some so rare that they only occur in a handful of patients. AlQuraishi found that common and rare mutations are equally likely to affect the protein network.
"Both kinds of mutations are equally strong," he said. "In both cases, about one percent of the common and one percent of the rare mutations alter the tumor networks we studied. But rare mutations are being largely ignored. We need to start paying attention to them."
For every common mutation, there are approximately four rare ones, so, based on numbers, rare mutations might be much more significant than previously suspected. "That's where much of the action is, in the rare mutations. We've long considered this large universe of rare mutations to be dark matter, but here we have just found that all this dark matter actually matters."
Reproducing Results
The researchers also found that mutations are not really the blunt force that they expected. Rather than knocking out an entire branch of a network, e.g., a neighborhood power outage, or inserting an entirely new character, i.e., a protein, mutations cause a subtle, almost surgically precise, altering of the communication pathway.
"From the perspective of the mutation, it is hard to be so precise," said AlQuraishi. "But cancer can't be too disruptive, or else it might die. It needs to fly under the radar. This subtle altering of networks achieves that objective. Drug companies can exploit this and possibly develop more targeted therapies."
A final area that these findings address is the problem of reproducing published results in the scientific literature. Here, however, the researchers are able to use fundamental physical principles to process datasets from different laboratories (including their own) in a way that removes the false positives and enriches for the true positives. The model is therefore more accurate and reproducible than any single data set.
"We can clean up the experiments by only using data that both the model and experiments agree on," said AlQuraishi.
"In general, much of the problem with irreproducibility in science is a problem of poor statistics," said Sorger. "We addressed that directly here."
Comments