Wait a minute - why is an article about automatic differentiation labeled under the "Physics" category? Well, I will explain that in a minute. First of all, let me explain what automatic differentiation is.
Computing derivatives of functions is a rather error-prone job. Maybe it is me, but if you give me a complex function where the dependence on a variable is distributed in several sub-functions, I am very likely to find N different results if I do it N times. Yes, I am 57 years old, and I should be handling other things and leave these calculations to younger lads, I agree.
But it so happens that as a particle physicist I have observed that the field needs to embrace the power of new computing methods, in order to obtain more from the design of experiments. Optimization is the word. To optimize a system you need to create a mathematical model of its working, and express the objective of the task as a function of the model parameters. At that point you will you be able to compute the derivative of the objective function as a function of the system design parameters. And then magic happens!
If you know how your objective function varies - because you know the value of its derivative with respect to each of the parameters that define the system you are studying - then you can move the parameters in the direction where they most increase the objective, and walk uphill until you find its maximum. This procedure, known since ages for the maximization of likelihoods or other mathematical problems, can be also applied to systems that rely on stochastic data to compute the objective. It is called stochastic gradient descent (I described ascent to a maximum, but of course just change sign and the concept is the same), and it is the engine under the hood of most machine learning algorithms nowadays.
Above, an animated gif showing how different algorithms handle gradient descent
So, let us all compute derivatives and spend our time happily coding our design problems into complicated computer programs! The problem with this approach, though, is that keeping track of the derivative of the parameters of your model can become incredibly complex if handled manually. But there are tools today that do this for you: you write code that computes some quantity, and the software keeps track of all the relevant derivatives. PyTorch, TensorFlow, Jax, and other tools do this seamlessly. Great. Or not so much.
The thing is, I am 57. And I am slow at learning new computer languages. 25 years ago I was living a happy life coding in FORTRAN, and the world changed under my feet - particle physics moved to c++. I had to adapt to it, and it took me a lot of effort. But I was half the age I am now. And now that all these fancy tools are available in Python, I am still stuck with c++. And so I compute derivatives by hand.
The software I am working on is trying to find optimized configurations for an astrophysics detector, SWGO. I wrote elsewhere about that and will write more in the future - it is a project that enthuses me. But it makes no sense to discuss it here. My code grew to be over 6000 lines of c++ code, and it contains the result of literally hundreds of pages with handwritten calculations of derivatives. I checked them and rechecked them, but how can I be sure that they are correct?
Of course, to some extent the results of running the program help: I can see that the program is finding optimal configurations by following my hand-computed gradients. But maybe hidden somewhere there is still a bug or three...
Enter Derivgrind. It is a software written by Max Aehle and collaborators at the University of Kaiserslautern. I know Max as he is a member of the MODE collaboration, and recently we met in Princeton for a workshop of that group. When he saw the extent of the mess I have created with my optimization code, he offered to help by teaching me how to use his software. Derivgrind can compute derivatives in an automatic way, and it works with c++ code. When I understood that I finally could verify the correctness of my calculations, I was ecstatic!
Today, after some guided installation, we were able to test the calculations in my code. By just inserting the Derivgrind libraries in the code, and adding a few extra calls, you get to compare analytic and automatic calculations. It is easy and fun! If you want to stick to c++ and compute derivatives by hand, but need some electronic eye to check from behind your shoulder, you should definitely download it and use it.
Derivgrind is available on github here. An article describing it is here.
Automatic Differentiation: DerivGrind
Comments