As an Applied Mathematician, I like to use mathematical modeling and computational techniques to try to better understand how things work in the world around me. One application I have studied over a number of years is how to compute the number of runs (and their distribution) for a team of baseball players with realistic data.
An important aspect of baseball which distinguishes it from many other sports is that baseball can be thought of as a sequence of one on one battles – the pitcher vs. the batter – whereas in sports such as basketball, hockey and others, the other players on the court or in the rink play a major role in the action. In baseball, the other players on the team surely contribute toward the outcome of the game but the pitcher vs. batter duels are far more significant.
An important aspect of baseball which CANNOT be neglected is that the order of event matters. A home run followed by a single (and then 3 outs) leads to one run while a single followed by a home run (and then 3 outs) leads to two runs. Therefore, any attempt to evaluate how often a lineup should get no runs, one, two runs and so forth, must take the order of events into account.
The first step in evaluating a lineup’s performance is to collect data for how you expect each player to perform (say, the probability of the player getting a single, double, triple, home run, walk or out against the opposing team’s pitchers). A model for how baserunners advance must also be considered (how often does a man on first make it to second or to third base when the batter gets a single).
One can set up a lineup, make some rules and simulate thousands or even millions of games to compute a run distribution or expected number of runs for a lineup or you can do it for two teams to compute the probability of winning a game. If you perform the simulation over again, you will get slightly different results. Advantages of this method are that it is pretty easy to code up and one can deal with various other issues (such as making lineup changes based on the current score of the game, pitching changes and extra innings) without much hassle. A disadvantage is that you’ll always have error bars in your results and need to perform numerous runs to reduce the magnitude of the interval in which you can be confident that the right answer lies.
Another way to model baseball for the purposes of computing
runs distributions is to use the concept of Markov processes. This method is
appropriate for modeling processes in which it does not matter how a certain
situation has arisen (for example, man on first with 2 outs and David Wright
coming to bat – we don’t care how we got to this situation, just that we are in
the situation) and that we know the probability of the current batter changing
the current situation to any of the other possible situations.
In baseball,
there are only 25 possible situations: 3 out situations (0, 1, or 2 outs) times
8 base runner situations (no one on base, man on first etc. through bases loaded)
plus the 3 outs state. By arranging the computation efficiently one can compute
the run distributions for 2 lineups and use basic probability ideas to compute
the probability of each team winning a game in about one second on a typical
laptop computer. One can also use this technique to compute how well a team
should do during a season, who should win the MVP and Cy Young awards, whether
a trade should produce more or less wins for a team, who should win a
post-season series and more.
Though some might disagree, I would argue that the
Markov approach enables one to evaluate situations with small distinctions
better than simulations – for example, the value of a single vs. a double or
which of two players if placed on an average team would have added more value
to that team’s performance.
Each year as we enter the Major League Baseball postseason
period, I post updates as to the probability of each team still in contention
to win its current series. These updates for 2013 can be found at: http://m.njit.edu/~bukiet/baseball/playoffs13.htm.
At this time, we have the probability of the Reds advancing to the National
League Division Series is 46% while the Pirates have a 54% chance of being able
to face the Cards. The Rays and Indians each have a 50-50 chance of making it
to the American League Division Series against the Red Sox.
Though Francisco Liriano was 8-1 with a 1.47 ERA at PNC Park this season, the math says you shouldn't bet the house - Johnny Cueto has a 1.90 ERA in 13 starts in his career at PNC Park and has held hitters to a .544 OPS there. Credit and link: ESPN
For the two
division series for which we know the contenders, in the National League, the
Dodgers have a nearly 2 in 3 chance to eliminate the Braves, while in the
American League the Tigers have a 56%-44% advantage over the Oakland A’s. These
probabilities change as the games are played and so updates will be posted at
the site regularly throughout the post-season.
Also, for the past 13 season, I have used the Markov approach to determine whether it is worthwhile to wager on baseball games. Even with updating the data and lineups about 3 times per season, the model has (slightly) more than made up for the spread. All those results are posted at www.egrandslam.com along with the “Daily Picks”.
Baseball is just one of the many, many aspects of life than can
be investigated using mathematics and I use it to demonstrate the power of math
in real-world situations to students at all levels. Personally, I have been
privileged to be able to apply mathematics to understand situations ranging
from food safety, to medical and biological issues and even explosives. So I
can say with authority that math is a real blast.
May the Power of Math be with
you!
Comments