Thursday, July 4, 2013

Developing race conversion factors for outdoor track races



Now that the outdoor track season has finished up here in Minnesota, both for high school and college athletes, I decided to do a follow-up on my article about developing race time conversions.  Several months ago, I posted a tutorial on how to use a spreadsheet program (like Excel) to do linear regression on performance lists in order to develop your own conversions between race distances.  This is a nice exercise in statistics, but it can be very useful if you need a conversion for an uncommon race that's unique to your state or collegiate conference.  It allows you to answer the question of "what kind of 1500m performance is necessary for a 15:00 5k?," but by doing the statistics yourself, you can also get a good idea of the range of performances over and under a particular race distance, or, in other words, you can see how reliable a conversion is.

Much like last time, I used the performance lists from the 2013 Minnesota Intercollegiate Athletics Conference (MIAC) outdoor track season.  For comparison, I also ran some statistics on the Minnesota State High School League's honor roll list.  Unfortunately, since the honor roll only accepts performances above a certain threshold, our statistical power to predict race performances is significantly lower, because we do not get as wide of a range in performance. 

For the impatient, the most salient results are summarized in the table below.  You can use these multiplicative conversions to use one performance to predict another over a different race distance.  Since they are simple multiplications, you can also stack them to convert, say, a 1500 to a 10k by first multiplying by the 1500m to 5k conversion, then multiplying again by the 5k to 10k conversion.  Be aware that stacking conversions will mean they are less reliable.


Some conversion (e.g. 800m to 1600m) are missing due to a lack of complete data—see below for more

For the more mathematically-inclined, we'll go further in-depth on analyzing the results.  For every race conversion from 400m up, I performed linear regression on the race performances of people who had recorded performances in both events.  So, for the men's 400m to 800m conversion, I found the linear equation (of the form y = a x + b) that best fit the data.  I also calculated the r2 value, or the coefficient of determination.  This factor can be pictured as the percent of the variance in longer race performance (over, say, 800m) that is predicted by your performance over the shorter distance (e.g. 400m). 

The full results are presented in the table below.



If you read my first post on using statistics to analyze performance lists, you'll remember that we found race conversions to be the most reliable when comparing races that are physiologically similar, meaning that they rely on the various energy systems of the body to similar extents.  Because of the way the body breaks down energy expenditure in middle and long-distance races, this leads to some curious results in our statistics.  Converting from 5k to 10k, for example, is significantly more reliable than converting a 400 to an 800.  In the outdoor track times, I also found that wider ranges in times generally resulted in more reliable conversions.  The women's 5k to 10k conversion, for example, is much more reliable than the men's conversion.  This might just be a statistical artifact, given that the data from the women was distributed over a broader range of times, though it is also possible that women do truly run more "predictable" 10ks when considering their 5k race performance.

When analyzing the gains when using the "full" linear regression model (instead of forcing the intercept of the equation to occur at zero, allowing for simple multiplication conversions), the only case in which there was a significant gain in predictability was in the men's 5k to 10k conversion, where using the full linear model resulted in an absolute gain of 8.7% in predictive power—the full linear model explains 78.0% of the variance in 10k performance based on 5k time, while the simple model explains only 69.3%.  The women's 1500m to 5k conversion showed a 3.0% gain, but I'm comfortable calling this insignificant.    

Of most interest to me was the particularly poor results in the men's 400m to 800m conversion.  Using the performance-list recorded times, there is almost no predictive power in knowing a particular runner's 400m time.  Only seven percent of the variability in 800m times could be explained by 400m time, and when forcing a simple multiplication factor, the predictive power is zero (technically negative, due to the way r2 is calculated).  This could mean one of two thing—either our data is incomplete, or there really is a very weak relationship between 400m time and 800m time.  One reason our data could be bad is because relatively few 800m runners actually run the open 400m often during outdoor track.  With plenty of opportunities to run a 400 at the end of a meet in the 4x4, it doesn't make a whole lot of sense to get tired before your prime event (the 800) in an open 400.  Strangely though, the women did not suffer from this same problem. 

When looking at a graph, it appears that many 800m runners are able to run quite fast (under 2:00) with open 400s ranging from 50.0 to almost 55 seconds.  I'd have to look at more data—probably lists of season bests which included relay splits—to determine whether this is a statistical artifact or a real physiological phenomenon. 

Because of the relative difficulty of getting onto the MSHSL high school performance lists, our predictive power in high school races is much lower.  There weren't even enough people in both the 400 and 800 lists to justify running analysis, and the 800 to 1600 conversions are woefully unhelpful.  The 1600 to 3200 conversion is respectable, at least for the boys.  Performance among girls in the 3200 appears to be much more variable than it is for boys; the reasons for this are not clear.  It could have to do with more variance in training, or it could be another statistical artifact.


So, aside from developing some helpful conversion factors, what have we learned? For one, open 400 performance doesn't have a very strong relationship with 800m performance, at least for men.  Second, a simple multiplication conversion is almost as good as a full linear regression model for most race conversions, the only standout being converting a men's 5k to a 10k.  And finally, as we saw with indoor races, conversions work best between races that are physiologically similar; even if a race is very close in distance to another (like the 400m and 800m), if they rely on differing energy systems, our conversions will be much less successful!

2 comments:

  1. I like all the great math John! That's pretty crazy how men have no predicability 400-800 but women do. Could that be because women tend to run multiple events more and might to open 400 and 800 in the same race? Just throwing out ideas. Happy 4th of July!

    Chas

    ReplyDelete
    Replies
    1. My bad for the typos. I meant "Could that be because women tend to run multiple events more and might do the open 400 and the 800 in the same meet?"

      Delete