The Italian "Accademia Nazionale dei Lincei" is an old institution, founded in 1603 to promote and cultivate natural science studies. It counted Galileo Galilei as a member, and it has never ceased to pursue its goal. Nowadays, it is an excellence cultural centre and is among the advisors of the President of the Republic.

The Accademia divides its scientific activities in many branches, with committees and international relations, prizes, conferences and symposia, publications, and all that. On March 5 it hosted a one-day symposium on Artificial Intelligence, which counted a very distinguished set of lecturers from computer science, biology, physics, and other disciplines. The focus were applications of AI and the ethical and societal fallbacks that the new technologies produce. The program of the symposium is here.

Because of the above, I was delighted to be invited to the event, to discuss the impact of Artificial Intelligence on fundamental physics research. In the 40' of my lecture I was cautious in avoiding the temptation of delving too deep in details that could be of little interest to the audience, who could follow the proceedings via live streaming (over 250 participants). 
I organized my presentation in three parts - a historical introduction of the path to AI and the true meaning, and true ingredients, of artificial intelligence; a broad overview of the advancements that the new techniques have offered to research in particle physics; and I concluded with a discussion of what lies ahead, focusing on my present research focus, which centers on harnessing differentiable programming techniques for the end-to-end optimization of the design of instruments and measurement systems (this is the topic of the scientific collaboration of which I am the scientific coordinator, MODE).

My talk was very well received, and it also served as a way to start post-mortem conversations on the above topics with some of the other lecturers. Below I will give a very short summary of the lecture - you will excuse me if I cannot offer a full transcript.


What is artificial intelligence ? What is intelligence ? How to define learning ? And does it all matter ?

The above are not idle questions - in fact, a number of great minds have pondered on how to best answer them. For you well understand that a proper definition is *very* important, as it allows us to really get to the heart of the matter. In my introduction I discussed several ways to provide an answer. In particular, take the first one. Nowadays we are accustomed to smartphones; self-driving cars don't make the headlines anymore; and language translation causes no awe. We would not call what powers them "artificial intelligence" any more than we would call intelligent the automatic windshield wipers that our cars are equipped with, and which detect rain and get on the move on their own. 

But all the above applications were once thought to be clear, futuristic, examples of artificial intelligence. Hence what has arisen is the idea that "AI is what hasn't been done yet": but this definition makes AI a moving target, and we should be more serious. Indeed, a system that can learn from its environment to adapt its operation and improve its performance can be called an artificial intelligent system. And there's plenty of these around nowadays.

Learning is the key. In the eighties of the past century, expert systems were a makeshift for what had been predicted in the previous decade to be tools of a revolutionary future. An expert system uses the knowledge of an expert to work and perform its tasks: it does not learn. What it takes for a machine to learn is an infrastructure that allows it to process data and make sense of them, to improve its performance. But how do machines learn? Well, they learn in a way that is not different from ours - by exploiting analogies. 

The analogy is really at the heart of the learning system in humans and animals. We learn unknown features of a new object, for instance, by creating analogies with objects we know, and predicting behavior in the new object by relating it to the known one. I always have a soft spot for analogies also in my didactical activity - I think they are as powerful a tool as you can get, if you can pull them off successfully [I have written an essay on the topic a decade ago, I should dig it out some day]. And I was delighted to read Douglas Hofstadter's book "L'analogie, coeur de la pensée" in the recent past, as I found endless inspiration on this topic there. Yes, that Hofstadter - the one who once wrote the groundbreaking book "Godel, Escher, Bach", which not by chance also dealt with artificial intelligence.

But I am divagating. My point, though, is that if we learn by constructing analogies, there is an even more fundamental ability which we commonly employ in doing that. And that is called classification. Only by assigning unknown objects to their equivalence class can we hope to then infer new behaviour in them, by observing the features of the other objects we placed in the same bin. So we classify objects, events, feelings, words, sentences, sounds, smells, and by doing so we learn more of our environments and of ourselves.

Classification is the heart of the matter

Do machines do that? Well, yes. Classification is really a very well-defined task that we use to allow machines to learn structure in data. And as much as we use classification in our own learning process, science progresses by creating effective classification schemes of natural objects and phenomena. Think of Darwin's evolution of species, or Mendeleev's table of elements: by putting elements and beings in tables and trees, not only did these guys understood the underlying structure of beings and matter elements: this also, crucially, allowed them to infer the existence of new, as-of-yet unseen objects, that had to fit in the void of the created structures. That way, e.g., Mendeleev predicted the existence of what was later called Germanium, and other elements.

The same mechanism has produced in 1964 the quark model, the skeleton upon which the Standard Model of particle physics is constructed. And so we have to agree that classification is still a powerful tool in the hands of experimental scientists. And again, nowadays boosted decision trees or neural networks used for standard binary classification are considered little more than a box with a crank to turn - but those tools were instrumental in the discovery of the Higgs boson in 2012!

Uses in HEP



The speaker who lectured the Lincei before me was Riccardo Zecchina. Prof. Zecchina is a professor in theoretical physics at the Bocconi University of Milano, and a very highly recognized expert in computer science. He gave an enlightening presentation, where among a number of other things he explained that a turning point for computer science was 2012, when the error in classifying images was suddenly brought down by a large margin by the use of deep network architecture, who could generalize much better than imagined, using an overparametrization of the problems. 

I exploited that assist in my presentation, as I could relate the error margin he quoted for image identification, in the tenths of a percent range, to the fraction of true Higgs bosons present in LHC collisions data, which counts in fractions of parts per billion! But we pulled that off, and that did rely heavily on machine learning tools. Artificial intelligence? Maybe not, maybe just algorithms. But really, these did stuff that we could not do ourselves. So 2012 was a turning point for the use of these methods in particle physics, too.

I then gave a broad overview of the advancements that supervised learning techniques have produced in the past decade in data analysis for the large LHC experiments. From the identification of b-quarks in hadronic jets, to the identification of boosted decays of heavy particles, to the triggering of events using neural networks with ultra-fast latency built on FPGAs, there was a lot to discuss indeed. 

I of course ended my talk by pointing out that there is one piece of business that HEP physicists have not yet shared with computer software tools: the design of instruments - detectors for measurements of particles, that is. And I argued that it is foolish to ignore the huge advantages that gradient descent methods can offer if one is capable of parametrizing all the important parts of a design problem in a differentiable pipeline. Again, to know more about this, please visit the website of the MODE collaboration.

I think this piece is long enough to have reduced the number of readers who are still with me down here by a factor of 100. So, to the 23 readers that remain, I can only say, thank you! Please leave a note in the comments thread, and maybe we can continue or deepen the discussion there.




---

Tommaso Dorigo (see his personal web page here) is an experimental particle physicist who works for the INFN and the University of Padova, and collaborates with the CMS experiment at the CERN LHC. He coordinates the MODE Collaboration, a group of physicists and computer scientists from eight institutions in Europe and the US who aim to enable end-to-end optimization of detector design with differentiable programming. Dorigo is an editor of the journals Reviews in Physics and Physics Open. In 2016 Dorigo published the book "Anomaly! Collider Physics and the Quest for New Phenomena at Fermilab", an insider view of the sociology of big particle physics experiments. You can get a copy of the book on Amazon, or contact him to get a free pdf copy if you have limited financial means.