After four months of frenzy by over 1500 teams, the very successful Higgs Challenge launched by the ATLAS collaboration ended yesterday, and the "private leaderboard" with the final standings has been revealed. You can see the top 20 scorers below.
No big surprises, as the winner Gabor Melis has been leading the public leaderboard (which was computed using only 18% of the data) for most of the past three months. If you compare the above data with the public leaderboard below, you however notice some significant differences. Most of all, just yesterday Lubos Motl had for the first time managed to rise to first place! In the final standings you see his team instead at 9th position, with a true score quite a bit lower than the one of the public leaderboard.
How is that possible ? Well, statistical fluctuations, of course. But there is more. If you examine the number of submissions (the column marked "entries" on the right) you notice that Motl submitted as many as 589 tentative solutions. In the public leaderboard the highest score among those 589 is reported. This is a case when the "look-elsewhere effect" applies: the highest score in a large sample is more influenced by fluctuations than the highest score in a smaller one.
Another way of looking at it is the following: by changing and tweaking the algorithm that classified the events, and by changing the threshold above which one calls "signal" an event to be classified, one may at times reach high values of the AMS classifier (which basically tells you how well your classification has scored, in that particular ordering of the events). But doing it too many times, without a proper way to decide which is the a priori best classification, will pick a positive fluctuation in the data. The same algorithm, then run on the "private" dataset, may produce instead a very different outcome - which is what you see in the private leaderboard, the final standings list.
Among the 10 final best scorers, Motl is the one who submitted most solutions (589), and the difference between public and private score is by far the highest, at 0.091; Gabor Melis, the contest winner, submitted less than a fifth of the solutions and the difference between public and private score is less than half than Motl's. Also note that none of the top 10 scorers has a negative difference between public and private scores!
You can have more fun with the numbers of all the 1792 submitters if you want - the data is at the kaggle Higgs site. I am more interested to know what are the algorithms that the winners used, at this point... Anyway, it was a great contest, and it also allowed me to test an algorithm I devised myself. That one could not compete with the top scorers, as it is not a "learning" algorithm; but I am quite happy to have scored above the simplest of the "boosted" classifiers anyway (you can find my entries at the 750th place of the private leaderboard).
Finally, you might recall that three months ago Lubos Motl waged $100 that nobody would surpass the AMS=3.80 mark in the final score. He lost that bet by an inch... In his blog he mentions this at the end of a very long post on the matter, which I however found a bit too detailed for my taste.
ATLAS Higgs Challenge Results
Comments