Three Sigma Are Nothing !

This morning I had a funny dream, and as I woke up at the end of it and watched the clock with the only eye I had managed to open, I realized it was not yet really time to wake up. On the other hand, I really liked the dream I had had: it was quite vivid and detailed, plus it lent an occasion for a blog post!

Hence I crawled out of the bed and reached for the nearest laptop in order to download the contents of my mind before it made room for something else and the dream got lost forever.

In the dream I am at a physics laboratory I cannot at first identify. I am being asked by a physics group convener (this is a CDF colleague, but the experiment does not look like CDF) to give a talk on the results of unblinding the second half of the data for an analysis seeking a signal of Supersymmetric particles in events with B hadrons.

[The word "unblinding" indicates the procedure of looking at the data, which in a blind analysis is hidden from the analysts in order to avoid the involuntary bias they may introduce in the results. Also, note that there are not many such SUSY signals that can appear in B systems: besides the dimuon decay of B_s mesons, which could betray the existence of new heavy particles through its abnormally high rate (note: nothing like that happened, and indeed the B_s branching ratio has been recently measured to be quite in line with standard model predictions) few other similar possibilities to study SUSY are presently investigated. But this is a dream, and in the dream it appears that the signal could appear as a difference in the behaviour of distinguishable species of B hadrons.]

I meet the analysis authors shortly before the meeting. They are crying. They are really desperate about the new data not confirming the signal they had seen in the old data. Instead of growing, the significance of the excess of events they see has fallen from three sigma to two sigma as the new data is added - a clear sign that the three sigma were a fluctuation. They tell me that they had actually already had a party to celebrate the discovery, sure as they were that this was indeed it.

As I mentioned above, the setting is unusual. As the dream proceeds it seems the context is that of the CMS experiment; yet the one I am standing in is not one of the rooms of Building 40 at CERN where similar meetings are usually run. It is not any of the CDF meeting rooms at Fermilab either. It seems like a meeting room at Point 5, the location of the CMS detector along the LHC ring; but I do not recognize the place. The mix-up of CERN and FNAL setting is also evidenced by the fact that this seems a sort of trailer complex.

The room is not large, and we are sitting in the midst of electronics crates and other parafernalia. As I then sit down and I load the talk I need to give, I see that the meeting page (on something that must be indico, the meeting archiving system used at CERN; but it doesn't look like it) actually contains ATLAS stuff. It really looks like an ATLAS meeting has just ended in fact, from 8 to 10 in the morning; now it is ten O'clock. In fact, I just met a CDF colleague now in ATLAS, Paolo Giromini, while he was immersed in a thick discussion with colleagues, explaining his technique of subtracting B hadron backgrounds witha sort of opposite-sign-minus-same-sign trick with the help of the blackboard.

I reload the meeting page on my laptop, and magically the 10-12 meeting schedule now appears. This time it all makes sense. I can thus have a look at the slides of my talk for the first time: the slides have been written by the convener who had asked me to give the talk! This is quite unusual,but not unheard-of. The problem is that I have no time to look at them...

While I am trying to make sense of the slides, the previous talk is going on. In the background I can hear a strange song, which distracts me and makes it hard to hear the speaker. I look around rather annoyed, until I realize that the song is coming from my own laptop! Rather embarassed, I turn off the volume.

It is soon my turn to talk. I stand up in front of the screen, waiting to start the talk as it is being introduced by the convener of the meeting, and I realize I don't know the content of the slides: I only had time to see the critical one which declares what happens when the second part of the data is looked at. So I decide to run the talk my own way: after all, I figure that everybody must know the details of the analysis well - all except me, perhaps. Hence I figure that rather than following the slides I have been given, I will turn this into an educational talk about delusions of new discoveries, failure to be objective when confronted to some data, and the real meaning of a statistical fluctuation.

Then a funny incident happens. A colleague of mine sitting in the front row jumps up and asks what I am doing there and why, since they all know that we haven't discovered new physics. By doing that she is practically disclosing the surprise part of my talk (most of the audience does not yet know the results of the unblinding procedure!), so I think to myself that this is rather impolite. I answer that I have been asked to give the talk because I am the chairperson of the statistics committee, and the important details of the talk concern the statistical procedure used (blinding the second half of the data, etcetera). The colleague then rises from her chair and leaves the meeting, and inside me I think this is a very childish behaviour.

As I start my talk, the first slide is projected on the wall behind my back. I don't exactly know what is written there, and I will not make reference to it in my talk. Instead, I start discussing the history of the analysis from my perspective: this is a search for SUSY - a supersymmetric signal in our data. I make the point that many sitting in the room must have been among those that, before the start of the experiment, were totally sold out to the idea that we would discover Susy particles right after turning on the machine. Indeed, we had been preached the need for SUSY at the electroweak scale for long enough that many of us had been brainwashed. But reality is different. This allows me to explain the different subjective expectation that each of us had on the result of this, as well as other, analyses.

"As the first half of the data was unblinded", I continue, "we found we had a three-sigma excess which did look like evidence of SUSY particles. But here," - I interject with some acting - "I need to ask you to pause and think at what are three sigma effects in an experiment like ours."

"We analyze tens of different datasets (triggered by different physics objects),we apply dozens of different sets of selection strategies aimed at studying the most improbable concoctions of our theorist friends; we look at scores of histograms, hoping that a bump or an excess or some other feature will appear here or there; we sometimes do not even know what we are looking for, so anything - anything!- which is not in perfect match with background expectations will cause the most sober of us to raise a gram of eyebrows, and the more enthusiastic to run in circles around the terminal, waving hands and attracting the attention of anybody in a 50-meter radius."

I pause, and then, with a boldface kind of voice, I continue: "So let me ask again. What are three sigma ? Yes, I know the numerical answer: three sigma are a 1.7 per mille effect. But I am asking a different question", and here I am walking around the stage turning one to one to the listeners in turn, "What does a three-sigma effect really represent in an experiment like ours? Three sigma are nothing! Nothing! Please repeat after me: three sigma are nothing!"

At this point the audience is starting to really repeat after me, as I, as on drugs, am shouting like a preacher, waving my hands and walking sideways with my eyes on those of the listeners. "Three sigma are nothing!".

And here the dream ends, since I at last weak up, wondering whether I had had too much Ouzo the previous evening, or what!

* * *

An obvious question remains to be answered, now that I am done reporting my dream above. Are three sigma really nothing ?

Well, it really depends. First of all, one must realize that in some cases one is looking at a numerical excess with no other connotations, while in other cases one has some additional evidence of the nature of the effect, which may provide further evidence in one direction or the other. This may be some shape information of a histogram, which is hard to quantify and is then left aside; or ancillary information from additional sets of data.

In other sciences three sigma are a pretty strong effect, worth an important publication and quite strong claims. But in physics we are much more cautious. I myself have experienced many more three-sigma effects that were due to statistical flukes than ones really due to signals popping up. I can recall even four, five, or even six-sigma effects that were flukes, sometimes combined with involuntary experimental biases; while I can recall only a handful of instances where a 3-sigma effect could be ascribed without doubt to a real signal previously unseen. One of this was the historic discovery of hte top quark: those three sigma CDF saw in 1994 from a counting experiment were very convincing, since they were backed up by a (then not used) very strong indication of clustering of the reconstructed event around 175 GeV, when backgrounds were expected to cluster to much lower values.

The first of the two reasons why particle physicists have become wary of three-sigma effects is that these arise naturally if one experiment is actually looking for them in many different places. But there is another reason: our measurements are so complex that we cannot ever be completely sure that we have neglected no possible systematic effect in our measurement; and in some cases these sources of error have quite non-Gaussian probability distributions, such that the concept of "sigma" (the width parameter of the Gaussian distribution) becomes rather meaningless: one may be quite sure that the true value one is measuring is contained in a certain interval [x-s,x+s] 68% of the times, and yet this may not really mean that only 0.17% of the times does the true value of the quantity being measured exceeds the x+3s mark.

Also note that moving the bar to "five sigma" -a 0.3 per million effect-, as is normally done in particle physics to be able to claim a definitive observation of a new effect, is a lousy fix in such a situation, one may argue. Indeed, it is a rather poor one: if systematics have non-Gaussian tails, it is the unit of measurement, the sigma, which loses its meaning, and no enlargement of the constant multiplying it can make it meaningful. But we do need some convention, apparently. And once such a convention is fixed by common practice, no argument can change that!

Front page image: Renke's blog

Related articles

Comments

Know Science And Want To Write?

Donate or Buy SWAG