Tuesday, June 21, 2016

Grandma's Marathon 2016: How much did the heat slow your time? A statistical analysis

Photo: Drew Geraets
The 40th annual Grandma's Marathon was held this past weekend in Duluth, Minnesota.  Grandma's is a staple of the Midwest marathon scene and is perennially praised as one of the best races in America.  The point-to-point course is a beautiful route that starts in the north woods of Two Harbors and follows the shore of Lake Superior into downtown Duluth.  With a slight net elevation loss and very few hills, the course is also usually a fast one.

This year, however, many participants were disappointed with their times.  Warmer than average temperatures and clear, sunny skies caused many runners to finish well back from their goals.  Since several runners that I coach or advise ran the race, I was curious to see how much of an effect the temperature had on their finish times.  So, as I often do, I started crunching some data.

Fortunately, I was able to stand on the shoulders of some Big Running Data giants—a 2012 scientific paper by Nour El Helou and other researchers in France already laid the groundwork for disentangling the effects of climate on marathon race times.  In their paper, El Helou et al. analyzed ten years' worth of results from six World Marathon Majors (London, Berlin, Paris, Boston, Chicago, and New York), resulting in a data set of 60 marathons.  These totaled almost 1.8 million marathon finishers.  El Helou et al. ran statistical analysis on each year's results, trying to find the correlation between ambient temperature during the race and the distribution of the finish times.

El Helou et al.'s methods

Because El Helou et al. (correctly) hypothesized that temperature would have varying effects on runners of different abilities, they analyzed several levels of performance for the top one, 25th, 50th, and 75th percentiles of male and female finishers.  So, for example, if the 2010 Chicago Marathon had 21,000 male finishers, the authors looked at the finish time for 210th place—that's the "one percentile" time.  This marker is more useful than looking at the winning time or 10th place, because those can be affected by things like the quality of the elite field, the tactics employed by the lead pack, and so on.  After extracting the various levels of performance for the 60 marathons in the data set, El Helou et al. then consulted meteorological records to find the ambient temperature midway through each of the 60 races. 

Doing regression analysis allowed El Helou et al. to correlate the ambient temperature with the distribution of finish times.  The broad trend in the results was not surprising: marathon times are slower when temperatures are too hot, and they are also slower when temperatures are too cold.  What was surprising, at least to me, was the optimal temperature for marathoning.  El Helou et al.'s data robustly shows that the ideal temperature for running a marathon is pretty chilly—39 degrees Fahrenheit (3.8° C) for a 2:40 marathon! Race times follow a parabolic curve, slowing significantly on either end of an optimal temperature. 

Click to enlarge
Further, the maxima of this parabola—the optimal temperature for running a marathon—changes as a function of how fast you're running.  The optimal temperature for a 4:30 marathon, for example, is more like 45° F (7.3° C).  As the graph below illustrates, the ideal temperature for both male and female runners follows a linear trend, with the fastest runners requiring substantially colder temperatures for optimal performance.

Click to enlarge

Notably, the top 1% of women are vast outliers.  Because of this I did not include them in the data analysis. 

I'm fascinated by this—does this represent a true physiological phenomenon, or is it just a statistical quirk? I contacted El Helou et al. several months ago to see if they had looked further into this wild anomaly, but I received no response.  I am currently in the process of conducing a similar data analysis on another set of marathon results that were not included in El Helou et al.'s paper (including the history of Grandma's marathon) to see if I can replicate this finding.  I will publish these results once I'm finished! It's particularly hard to find large marathons that are held in cold conditions (45° F / 7.3° C or colder), so if you know of any, drop me a line. 

I do have one possible explanation: top female marathoners tend to be very small in stature, even by distance runner standards.  A male 2:40 marathoner might easily be 5'9 to 6' tall and weigh 150 or 160 pounds, while a female running the same time is likely to be substantially lighter and shorter.  Because of how volume and surface area scale, top female runners have a very high skin surface area to body mass ratio, meaning they radiate heat much more effectively than a taller or heavier runner.  This might cause them to lose too much heat in cold conditions that would otherwise favor fast times.  This may have implications for top international male runners too.

Indeed, the shape of the parabolas for a given time for men versus women appears to differ: men tend to do comparatively worse in hot conditions than women, which makes we would expect if the surface area to body mass ratio hypothesis is true.   

In any case, this outlier aside, it's clear to see that faster marathons require colder temperatures.  But that's not the only takeaway from El Helou et al.'s data.  Their statistical analysis also allows us to predict how much slower a marathon will be when temperatures are substantially hotter or colder than optimal. 

How much slower was Grandma's Marathon in 2016?

Since the methods I used to extract a general formula from El Helou et al. are very tedious, I'll jump right to the interesting part, which is Grandma's Marathon this year.  The table below illustrates my results.
Ideal time in optimal conditions
Optimal marathon temperature
Expected time at Grandma's 2016
38° F
39° F
41° F
42° F
44° F
45° F

The calculators are based on the simple average of start line temperature at 7:45 am in Two Harbors (65° F) and the finish line temperature in Duluth (75° F) at 10:55 am on race day.

At first glance, it appears that times were a lot slower, and this is certainly true.  I should point out, however, that Grandma's Marathon is almost never within the optimal marathoning temperature for most anyone.  At one hour and fifteen minutes into the race (9am), the average temperature for the past twelve years has been 61° F.  Only twice has the temperature at 9 am been below 50° F. 

It's also worth pointing out that some people handle heat better than others.  These are only averages; some people are likely affected much worse, while others are affected to a lesser degree.  El Helou did not provide error margins along with their data, so I can't say what the standard deviation looks like for these values.  Even if they did, I'm sure the error propagation would be massively challenging!


The bottom line remains, however—Grandma's Marathon was at least several minutes slower for most participants.  The prediction based on El Helou et al.'s model is supported by the real-world results.

In the past 11 years (2005-2015), the top one percentile of men at Grandma's have an average finish time of 2:35:10 and the average temperature has been 60° F.  This year, the top one percentile was 2:38:28, a difference of three minutes and 18 seconds.  The generalized formula equates a 2:33:03 in perfect conditions to 2:35:10 in 60° F (average Grandma's temperature), and to 2:37:43 in this year's conditions, a difference of two minutes and 33 seconds.  Not a bad prediction!

Another example: The 50th percentile in last year's men's race was 3:55:54.  The temperature last year was 54° F.  This year's 50th percentile was 4:25:49; the formula predicts (converting back to ideal conditions, then to this year's conditions of 70° F) a 4:22:46.  Pretty good.  In fact, you could even bargain your way to a higher temperature because the four-hour marathoner is out in the heat longer, so the weighted average of temperature should be higher—more like 72 or 73 degrees.  That gets you even closer to the real world result.

The rest of this article is long, boring, and appeals only to nerds like me.  Here is the gist of the next 600 words:
·         Women may handle intense heat slightly better than men.
·         The data in the table above are extracted from men's performances only, but women's performances should be pretty close in most cases.
·         This model falls apart pretty quickly for marathon times under 2:30:00.
·         Someday I might turn this into a cool web app that you can play with, but I've got a lot on my plate right now so don't get your hopes up.

Methods: Extracting the generalized formula from El Helou et al.

The supplemental data in El Helou et al. provides data points for their parabolic model of temperature and marathon time.  By extracting these and putting them back into a parabolic fit curve, it's possible to develop a general formula for predicting how much slower you'll run under suboptimal conditions.

Any parabola has the form y = ax2 + bx + c, including the best-fit parabolas for each discrete marathon time provided in the El Helou et al. paper (derived from the expected time in optimal conditions for the top 1, 25, 50, and 75 percentile of male finishers. 

It is reasonable to expect that the shape of this parabola changes in a smooth and predictable function according to the speed at which you run.  In other words, though it may be the case that deviations from optimal temperature affect faster runners more or less severely than slower runners, these changes probably occur smoothly on a continuum.  Because of this, we can develop functions that describe how the constants of the parabola (a, b, and c) change as a function of your running speed under optimal marathon conditions.

As noted above, it is not reasonable to expect that men and women change the same way, so the "shape trends" of each parabola must be treated separately by sex.  Because I don't have valid data for the top 1% of women (given how big of an outlier that group was), I can only develop a general formula for men.  That being said, the results should still be pretty good for most female runners in most conditions; they line up pretty well at the 25th and 50th percentiles.

In brief, here is a look at the regression analysis I used to determine the general formula describing the shape of the speed deviation parabola for male marathoners:

Note that changes in a, the first constant in the formula, appear to be proportional to the square of your running speed.  b and c appear to follow linear trends. 

I was trained as a chemist.  In chemistry, we have a rule: "If it doesn't look linear, don't do a linear fit!" As such, I used a quadratic fit for changes in the constant a.  This area of math is interesting and perplexing to me—is this model of smooth continuous parabola shape change really accurate? You'd need a lot more data to validate this.  It would be a cool project, but would be extremely time-intensive.

In any case *waving hands*, let's just assume it is.

This allows for a general formula that gives you the equation of the race time parabola for a particular goal marathon pace.  From the linear graph earlier, we already know the optimal temperature for any given marathon pace, so it's easy to double-check our general formula by plugging in a known marathon time and seeing where the peak of the parabola ends up.  There are, of course, some small differences because of rounding errors and the fact that this data is empirically derived.

Finally, we can create a master formula that can predict your actual marathon time based on the temperature during the race and your optimal marathon performance in ideal conditions.  This allows you to work backwards too, answering questions like "If I ran 3:06:00 in 66 degree weather, what could I run in 55 degree weather?" (answer: 3:00:27).

There are some major limitations to the model, though.  It's clearly not valid for times under about 2:30:00, since the predictions start to fall apart.  As you approach elite men's times, the model starts to predict faster times for hotter temperatures.  This is probably a result of the assumptions made and the limited data available when deriving the general formula to predict the trend in the shape of the parabola describing the relationship between race time and temperature: El Helou et al.'s data only extends down to 2:41 marathoners.  Extrapolating 30 or 40 minutes faster than that lands you in uncharted and inaccurate territory. More data analysis is needed to refine the model so it can be used for elite marathon times, too.  I'm also not comfortable using the model for women running under about three hours, because of the aforementioned issue with the top one percentile of female finishers.  I'll have a better idea of what's going on there once I finish my own analysis of top female times at other races and confirm or refute my surface area/body mass hypothesis.

Despite this, the formula proves immensely useful and quite accurate for the 2:30 to 4:00 marathon crowd.  Unfortunately for you, the generalized version of the formula is only currently workable in a very messy Excel chart on my computer.  Someday, I hope to develop a simple web app that I can embed in this article that allows you to play with the numbers in an easy, interactive way.  But that would require dusting off my JavaScript programming skills, which seems like a low priority right now.  If you really want to sink your teeth into this, email me and I'll send you the raw data.


  1. Wow, that was dead on. I was shooting for a 3:30 and got a 3:48 at Grandma's

    1. Same. Was going for 3:25-3:30 and finished 3:45.

  2. Aupa Jonh!!!
    As always, very interesrting article!!!!
    Unfourtanetly I don't have any data of a cold marathon....
    Last year in Porto Marathon in a very hot and sunny race I run 2:45, I was suppossed to be under 2:40.... I wonder how it will be in óptimal conditions.
    5 month later 2:39 in Hamburg marathon, and I was not trained as I was for Porto marathon.
    In October i'll go to Chicago I hope, weather conditions will be nice for running.
    As a chemist, I'm very interested on this staff, so i'll contact you by mail

  3. Data analysis in business paraphernalia has different facets and methodologies. Different aspects of life which includes businesses, politics, science, etc have different interpretation of the data, but the data collection is a basic thing for successful execution. See more statistical data analysis services