Now that the outdoor track season has finished up
here in Minnesota, both for high school and college athletes, I decided to do a
follow-up on my article about developing race time conversions. Several months ago, I posted a tutorial on
how to use a spreadsheet program (like Excel) to do linear regression on performance lists in order to develop your own conversions between race
distances. This is a nice exercise in
statistics, but it can be very useful if you need a conversion for an uncommon
race that's unique to your state or collegiate conference. It allows you to answer the question of
"what kind of 1500m performance is necessary for a 15:00 5k?," but by
doing the statistics yourself, you can also get a good idea of the

*range*of performances over and under a particular race distance, or, in other words, you can see how reliable a conversion is.
Much like last time, I used the performance lists
from the 2013 Minnesota Intercollegiate Athletics Conference (MIAC) outdoor
track season. For comparison, I also ran
some statistics on the Minnesota State High School League's honor roll list. Unfortunately, since the honor
roll only accepts performances above a certain threshold, our statistical power
to predict race performances is significantly lower, because we do not get as
wide of a range in performance.

For the impatient, the most salient results are
summarized in the table below. You can
use these multiplicative conversions to use one performance to predict another
over a different race distance. Since
they are simple multiplications, you can also stack them to convert, say, a
1500 to a 10k by first multiplying by the 1500m to 5k conversion, then
multiplying again by the 5k to 10k conversion.
Be aware that stacking conversions will mean they are less reliable.

Some conversion (e.g. 800m to 1600m) are missing due to a lack of complete data—see below for more |

For the more mathematically-inclined, we'll go
further in-depth on analyzing the results.
For every race conversion from 400m up, I performed linear regression on
the race performances of people who had recorded performances in both events. So, for the men's 400m to 800m conversion, I
found the linear equation (of the form

*y = a x + b*) that best fit the data. I also calculated the r2 value, or the coefficient of determination. This factor can be pictured as the percent of the variance in longer race performance (over, say, 800m) that is predicted by your performance over the shorter distance (e.g. 400m).
The full results are presented in the table below.

If you read my first post on using statistics to analyze performance lists, you'll remember that we found race conversions to be
the most reliable when comparing races that are

*physiologically similar*, meaning that they rely on the various energy systems of the body to similar extents. Because of the way the body breaks down energy expenditure in middle and long-distance races, this leads to some curious results in our statistics. Converting from 5k to 10k, for example, is significantly more reliable than converting a 400 to an 800. In the outdoor track times, I also found that wider ranges in times generally resulted in more reliable conversions. The women's 5k to 10k conversion, for example, is much more reliable than the men's conversion. This might just be a statistical artifact, given that the data from the women was distributed over a broader range of times, though it is also possible that women do truly run more "predictable" 10ks when considering their 5k race performance.
When analyzing the gains when using the
"full" linear regression model (instead of forcing the intercept of
the equation to occur at zero, allowing for simple multiplication conversions),
the only case in which there was a significant gain in predictability was in
the men's 5k to 10k conversion, where using the full linear model resulted in
an absolute gain of 8.7% in predictive power—the full linear model explains 78.0%
of the variance in 10k performance based on 5k time, while the simple model
explains only 69.3%. The women's 1500m
to 5k conversion showed a 3.0% gain, but I'm comfortable calling this insignificant.

Of most interest to me was the particularly poor
results in the men's 400m to 800m conversion.
Using the performance-list recorded times, there is almost

*no*predictive power in knowing a particular runner's 400m time. Only seven percent of the variability in 800m times could be explained by 400m time, and when forcing a simple multiplication factor, the predictive power is zero (technically negative, due to the way r^{2}is calculated). This could mean one of two thing—either our data is incomplete, or there really is a very weak relationship between 400m time and 800m time. One reason our data could be bad is because relatively few 800m runners actually run the open 400m often during outdoor track. With plenty of opportunities to run a 400 at the end of a meet in the 4x4, it doesn't make a whole lot of sense to get tired before your prime event (the 800) in an open 400. Strangely though, the women did not suffer from this same problem.
When looking at a graph, it appears that many 800m
runners are able to run quite fast (under 2:00) with open 400s ranging from
50.0 to almost 55 seconds. I'd have to
look at more data—probably lists of season bests which included relay splits—to
determine whether this is a statistical artifact or a real physiological
phenomenon.

Because of the relative difficulty of getting onto
the MSHSL high school performance lists, our predictive power in high school
races is much lower. There weren't even
enough people in both the 400 and 800 lists to justify running analysis, and
the 800 to 1600 conversions are woefully unhelpful. The 1600 to 3200 conversion is respectable,
at least for the boys. Performance among
girls in the 3200 appears to be much more variable than it is for boys; the
reasons for this are not clear. It could
have to do with more variance in training, or it could be another statistical
artifact.

So, aside from developing some helpful conversion
factors, what have we learned? For one, open 400 performance doesn't have a
very strong relationship with 800m performance, at least for men. Second, a simple multiplication conversion is
almost as good as a full linear regression model for most race conversions, the
only standout being converting a men's 5k to a 10k. And finally, as we saw with indoor races,
conversions work best between races that are

__physiologically similar__; even if a race is very close in distance to another (like the 400m and 800m), if they rely on differing energy systems, our conversions will be much less successful!
I like all the great math John! That's pretty crazy how men have no predicability 400-800 but women do. Could that be because women tend to run multiple events more and might to open 400 and 800 in the same race? Just throwing out ideas. Happy 4th of July!

ReplyDeleteChas

My bad for the typos. I meant "Could that be because women tend to run multiple events more and might do the open 400 and the 800 in the same meet?"

Delete