Tuesday, May 19, 2015

Twin Cities 1 Mile in retrospect: How slow was it? A statistical analysis



2015 TC 1 Mile Champions Garrett Heath and Heather Kampf

The Medtronic Twin Cities 1 Mile is one of the premier road miles in the United States.  It has hosted Olympic medalists and has borne witness to several sub-four-minute miles.  In addition to a top-flight pro race, the TC 1 Mile features several "open" waves, which usually total over two thousand finishers.  The traditional course was flat and very fast.  This year, the installation of a new light-rail transit line forced the course through downtown Minneapolis to be changed, likely permanently. 

This course change was announced a few months ago, and after researching the elevation profile of the new course, which gains about 30 feet of elevation in the first half mile before flattening out, I published an article in which I predicted the new course would be five to eight seconds slower. 

The race itself, which happened last Thursday, was held on a cool, rainy evening with slight winds.  Weather data pegs the exact conditions at 54 degrees F, light rain, and 9 mph winds at race time—certainly not conducive to the very fastest times, but not terrible.  The winner, Garrett Heath (a Minnesota native), took the win in 4:08, which was a sharp contrast to Nick Willis' blistering 3:56 course record the last time the race was held.  Heath himself was runner-up in that race with a 3:57. 

By looking just at the pro results, the new course looks substantially slower than the old one, but you could chalk this up to cautious tactics early in the race, or just a fluke from a small sample size.  To get a real answer on how much slower the new course was, and how accurate my prediction was, we'll have to do some statistical analysis.

The rest of this article will go in detail on the methods I used to compute how slow the course actually was, but if you're just looking for a quick conversion, here it is: For competitive runners, the 2015 TC 1 Mile was 13 ± 3 seconds slower than the 2013 course. A more accurate conversion is to multiply your 2015 race time by 0.9581 to get the equivalent 2013 time and multiply your time by 0.009 for the uncertainty. 


Statistical methods of comparing the two courses

Broadly, there are two ways of going about looking at the relative competitiveness of a given time on the old course versus the new one.  The first and more simple method is a place-by-place analysis—compare both races and analyze what it took for 1st place, 10th place, 50th place, and so on.  This looks merely at how competitive a certain finishing position is, with no regard to how specific runners performed year-to-year.  While simple, this method is vulnerable to being swayed by the size of the race.  Perhaps because of the poor weather, this year's TC 1 Mile had only 1,600 finishers compared to 2013's 2,500 (the race was cancelled last year because of stormy weather).  This might shift the bell curve of competitiveness towards slower times.  In some years, the TC 1 Mile has played host to the USATF Road Mile Championships, which would also draw a more competitive field.

A workaround for this would be to compare the race times of individual runners who competed in both races.  Though some runners are surely in better shape this year, some are also slower, so when averaged out over many, many runners, this gives us another way of looking at the relative speed of the course.

Another question to ponder is whether we want to develop a simple conversion that uses a static value to "convert" from one race distance to another—e.g. "The 2015 course is x seconds slower"—or whether we want to use a multiplier ("The 2015 course is y percent slower"). Strictly speaking, the multiplier method should be more accurate, since runners of different speeds are on the course for longer or shorter periods of time.  A four-minute miler is probably not going to be slowed down as much as a six-minute-miler.

Instead of deciding which way to process the data, I decided to do run several combinations of methods and compare the results.  Surprisingly, I found that any reasonable way of comparing the courses results in about the same result, with roughly the same margin of error!

The data and the results

What follows is a brief overview of the methods used to calculate the conversion factors for all seven statistical methods I tested. 

Place-by-place analysis: static and multiplier

For the place-by-place model, I examined 1st, 10th, 20th, 50th, 100th, and 250th place finishing times from both races. In each case, I either subtracted the 2013 time from the 2015 time to get a static conversion factor (a certain number of seconds) for each place, or I divided the 2013 time by the 2015 time to get a multiplier conversion factor (a percentage).  The places I chose were of course arbitrary; I stopped at 250th because I wanted to limit model to relatively fast runners—250th place this year was a hair over six minutes.  In all cases, the 2015 time for a given finishing place was slower than in 2013.  The confidence intervals ( ~95%) were developed by using twice the standard error of the sample mean (I'm pretty sure this is the correct method but it's been quite a while since I've taken a formal stats course, so by all means let me know if I've veered off course!). 

Result:

Static: The 2015 course was 10.7 ± 3.7 seconds slower
  
Multiplier: Your 2015 time multiplied by 0.9649 ± 0.008 gives the equivalent 2013 time


Descriptive statistics: static and multiplier

When we talk about "descriptive statistics," we're talking about the mean, standard deviation, and standard error of a population of measurements.  In the present case, we're talking about the population of runners who competed at the TC 1 Mile in both 2013 and 2015.  As with the place-by-place analysis, I restricted my data to only include runners who finished in the top 250 places in both years.  This left some 65 runners whose performances I could analyze.  Of these, only ten (15%) ran faster in 2015 than they did in 2013.  For the static conversion, I computed the time difference between 2015 and 2013 (including the fifteen negative numbers) and determined their standard error.  For the multiplier, I divided the 2015 time by the 2013 time and likewise calculated error margins. 

Note that, for fast runners, the result here gives the most generous conversion out of all the statistical methods used.  You might think that this is because of the inherent flaws of a static conversion a six-minute miler is on the course a lot longer than a four-minute miler, so a static conversion will be overly generous for the faster runner) but even if you only look at runners who ran under five minutes in 2013, their mean difference in race time for 2015 was actually higher (14.0 seconds).

Result:

Static: The 2015 course was 13.0 ± 2.9 seconds slower

Multiplier: Your 2015 time multiplied by 0.9601 ± .009 gives the equivalent 2013 time


Linear regression: multiplier

A more statistically accurate way of determining a multiplier conversion factor is to use a statistical method known as linear regression.  Instead of taking the raw mean of each data point, we use a least-squares regression line to determine the "best fit" for a line that best represents our data.  A full linear regression model would be a line function like you might remember from algebra class, with the form y = mx + b where y and x are your 2013 and 2015 times, respectively.  m would be the slope of the line, and b would be its y-intercept.  In this context those numbers don't really mean anything, they're just a model. 

A more simple way to do this would be to force the y-intercept to equal zero, which simplifies the equation to y = mx. This makes the conversion process easier, but we need to make sure we aren't giving away too much in the way of statistical power by making this simplification.  We can do this by comparing the coefficient of determination, also called the R2 value.  This just tells us how much of the variation in times from 2013 to 2015 can be explained by our model.  In this case, the full linear model (with a y-intercept) has an R2 of 0.767, while the simple linear model (no y-intercept) has an R2 value of 0.726, meaning we only lose 4.1% of our explanatory power.  Pretty good—and better, this allows us to compare the results from linear regression conversion to our simple mean results from the other models.

I used linear regression to determine a multiplier only (and not a static value) because finding the analogous value for a static value is harder—meaning, not automated in MS Excel.  I'm sure there is a one-dimensional analogy to least-squares (minimum distance, probably) but I don't see much value in manually computing this.  In fact it might just be mathematically identical to the sample mean, but I'm not sure.  At this point, I'd rather keep poring over results than do that proof!

Result:

Multiplier: Your 2015 time multiplied by 0.9581 ± 0.009 gives the equivalent 2013 time.


Pro results only: static and multiplier

Maybe you're a snob who thinks that the pitiful peons in the open race may have been slowed by the uphill course, but the pros are a different breed.  To answer that challenge, I looked at only professional runners who competed in both the 2013 and the 2015 race.  There were only six: Garrett Heath, Craig Miller, Jonathan Peterson, Scott Smith, Heather Kampf, and Meghan Peyton.  Despite this, the results for both static and multiplier conversions were astoundingly similar to the results from the much larger sample of open runners (and place-by-place analysis)!

Result:

Static: The 2015 course was 10.2 ± 2.9 seconds slower

Multiplier: Your 2015 time multiplied by 0.9616 ± 0.012 gives the equivalent 2013 time




All conversion summarized

The table above presents the result of all seven conversion methods I developed.  By visual inspection, and by plugging in different race times, you can see that their results are all very similar and the margin of error is relatively small.

Discussion and limitations

All in all, it was fairly surprising to me how well the final results from the seven different statistical models agreed with each other.  Every conversion gives a time which within the confidence intervals of all of the other conversions, and the lengths of the confidence intervals themselves are fairly short. 

The one major downside to this type of analysis is that it does tell is what about the course actually caused the ~10 second slowdown from 2013 to 2015.  It's a very good bet the 30-foot elevation gain from start to finish played a major role, but some people might argue that the weather had a major impact.  True, it was raining, and the temperature was not ideal, but it was actually colder in 2013! Race-time temperature was 48 degrees in 2013 compared to 54 degrees this year.  You might be able to negotiate and hand-wave your way to chalking up about 2-3 seconds of the slowdown to the rain this year, but not much more than that.

You can play around with the conversion factors to your heart's content, and I won't bore you with lengthy tables comparing dozens of equivalent performances.  Instead, to illustrate how well all of these models come together, I'll present a forest plot that shows what a 4:00.0 in 2013 would be equivalent to on the 2015 course.  After seeing this, you can understand why nobody came close to a sub-four mile! Heath's 4:08.3 from this year converts to almost exactly the same time he ran last year—the linear regression multiplier conversion pegs his performance this year as equal to 3:57.9 (plus or minus 2.2 seconds), and his actual time last year was 3:57.1.  Not bad!



Conclusion: A new course?

For competitive runners, this year's Twin Cities 1 Mile was about 10-12 seconds slower than the previous course.  Some of this could be explained by the rainy conditions, but temperatures were not substantially different between 2013 and 2015, so the uphill route remains as the best explanation for the slowdown in times.  In February, I predicted a 5-8 second slowdown based on data from submaximal treadmill tests.  If anything, I actually underestimated how slow the course would be by a few seconds! Chalk that up to the rain, I suppose.  So far, I've resisted the urge to say "I told you so," but...well, there, I said it. If you're a race director, I'm available for consulting...

Even in perfect conditions, it will remain extremely difficult to run fast times on the new TC 1 Mile course.  This is very unfortunate, as it was a very special opportunity for elite runners to be in the local spotlight.  Additionally, breaking the still-standing course records will be nigh impossible.  This will detract from the ability of Twin Cities in Motion to recruit top-flight runners, given that the $10,000 bonus for a course record is out of reach.  Heath's converted time is within a second of Willis' course record, and women's pro winner Heather Kampf's converted time was only three seconds off Sara Hall's record from 2011.  Finally, we shouldn't underestimate the marketing potential of being able to see a four-minute mile in person.  People in the general population are only vaguely aware that it is possible for a handful of humans to complete that task—imagine strolling along Hennepin Avenue on a Thursday evening, only to discover that four or five men are about to run one before your very eyes!

This is not to say that Twin Cities in Motion did a bad job.  I ran in the race this year, and TCM did an excellent job with every aspect of the event.  It was a great race and I'd love to do it again.  I do with they'd change the course, though! Given that the new light rail schedule puts the old course permanently off-limits, selecting a new course is in order.  There are a few critical criteria: the course has to start and finish in an area that can accomodate a few thousand people, have space at the start for a staging area with port-a-potties, registration tents, and it must be as flat as possible and should have very few turns.  For competitive professional races, the course must not have a net elevation gain of more than ten feet. 

To prevent absurdly fast times, the course should not be a significant net downhill either, though drops of up to about twenty feet are permissible (as a rough rule of thumb, a downhill will speed you up by about 2/3 the amount that an equivalent uphill would slow you; by this rule, if this year's race were simply reversed, times would be about 7-ish seconds faster than 2013). 

I personally thing Uptown would be a great place for the TC Mile!
There are a lot of good locations in Minneapolis that could play host to a road mile.  If TCM wants the race to be through downtown, it should be easy to set up a course heading southeast or northwest, perhaps along Washington Avenue or 6th Street.  These streets run parallel to the light rail tracks so that should be not an issue.  My own humble suggestion is to consider running the race through the heart of Uptown.  A course starting and/or finishing near Calhoun Square would be a great "tour" of the uptown area, and there are plenty of bars and restaurants to host post-race festivities.  The Uptown area plays host to several road races, but all of them just go along Lake Calhoun—none actually run on the streets!

At the very least, if TCM does not move the course next year, they should start over with a new course record.  According to the statistical models developed above, breaking Nick Willis and Sara Hall's records would take the equivalent of a 3:46 and 4:20 mile effort on flat ground.  We can be fairly certain that's not going to happen anytime soon.

2 comments:

  1. In retrospect of various concerning values it has been well established on the record and would favorably allow them to ease up their career.

    ReplyDelete
  2. Our amazing services of comparative data analysis are useful because they allow people to directly state what they like or don’t like about something and if you operate any sort of business.

    ReplyDelete