Not long ago, Howard Wainer, a statistician I mentioned recently, learned that his blood sugar was too high. His doctor told him to lose weight or risk losing his sight. He quickly lost about 50 pounds, which put him below 200 pounds. He also started making frequent measurements of his blood sugar, on the order of 6 times per day, with the goal of keeping it low.
It was obvious to him that the conventional (meter-supplied) analysis of these measurements could be improved. The conventional analysis emphasized means. You could get the mean of your last n (20?) readings, for example. That told you how well you were doing, but didn’t help you do better.
Howard, who had written a book about graphical discovery, made a graph: blood sugar versus time. It showed that his measurements could be divided into three parts:
measurement = average + usual variation + outlier (= unusual variation)
Of greatest interest to Howard were the outliers. Most were high. They always happened shortly after he ate unusual food. Before a reading of 170, for example, he had eaten a pretzel. He had not realized a pretzel could do this. He stopped eating pretzels.
When Howard told me this, it was like a door had opened a tiny crack. Recently a deep-sea treasure-hunting company found a shipwreck off the coast of Spain. They named it Black Swan, apparently a reference to Nassim Taleb’s book. Shipwrecks are black swans on the ocean floor; black-swan weather had sunk the ship. For Howard, outliers were another kind of buried treasure: the key to saving his sight.
It isn’t just Howard. Outliers are buried treasure in all science. They are a source of new ideas, especially the new ideas that lead to whole new theories. The Shangri-La Diet derived from an outlier: Unusually low hunger in Paris. My self-experimentation about faces and mood started with an outlier: One morning I felt remarkably good. My discovery that standing improved my sleep started with a series of days when I slept unusually well.
Modern statistics began a hundred years ago with the t test and the analysis of variance and p values — very useful tools. Almost all scientists use them or their descendants. Almost all statistics professors devote themselves to improvements along these lines. However, conventional statistical methods, the t test and so on, deal only with usual variance. (Exploratory data analysis is still unconventional.) As Taleb has emphasized, outliers remain not studied, not understood, and, especially, not exploited.
Modern statistics began a hundred years ago with the t test and the analysis of variance and p values — very useful tools. Almost all scientists use them or their descendants. Almost all statistics professors devote themselves to improvements along these lines. However, conventional statistical methods, the t test and so on, deal only with usual variance. (Exploratory data analysis is still unconventional.) As Taleb has emphasized, outliers remain not studied, not understood, and, especially, not exploited.
Before the invention of statistical tests, such as the t test, science moved forward. People gathered data, computed averages, drew reasonable conclusions. As far as I can tell, modern ways of analyzing data improved the linkage between data and conclusion because they reduced a big source of noise: How the data were analyzed. Procedures became standardized. Hypothesis testing improved. Hypothesis formation, however, did not improve. Knowing how to do a t test and the philosophy behind it will not help you come up with new ideas. Yet data can be used to generate new ideas, not just test the ones you already have.
Our understanding of outliers is in a kind of pre-t-test era. People use them in an unstructured way. As Howard Wainer’s analysis of his blood sugar data indicates, better use of them will improve hypothesis formation. A kind of standardized treatment should help generate ideas, just as the t test and related ideas helped test ideas. Here are some questions I think can be answered:
1. Cause. What causes outliers? It’s a step forward to realize that outliers are often caused by other outliers. Howard has found that unusually high blood sugar readings are caused by eating unusual (for him) foods.
2. Inference. I’m fond of saying lightning doesn’t strike twice in one place for different reasons. The longer version is if two outliers could have the same explanation, they probably do. I think this principle can be improved.
3. Methodology. To test ideas, you want variation to be low. To generate ideas, you want outlier rate to be high. Howard could make progress in understanding what controls his blood sugar by deliberately testing foods that might produce outliers. In genetics, x-rays and chemical mutagens have been used to increase mutation rates; mutations are outliers. (Discovery of a white-eyed mutant fruit fly led to a wealth of new genetic ideas.) In physics, particle accelerators increase the outlier rate in order to discover new subatomic particles.
There are no comparable procedures for psychology. Self-experimentation increased my rate of new ideas because it increased my outlier detection rate. It increased that rate for three reasons:
1. I kept numerical records.
2. I analyzed my data using the same methods as Howard.
3. I did experiments.
Travel is like experimentation; there too it helps to keep numerical records and analyze them. The question: What are the basic principles for increasing outlier rate?
Comments