The Zombie Cyclist

Thursday, August 12, 2021

TRIMP, Intensity, and Fatigue

TRIMP, which stands for TRaining IMPulse, is a measure of training load, the amount of fatigue a ride generates. The longer the ride, the greater the fatigue. The more intense (harder, faster) the ride, the greater the fatigue. TRIMP is calculated by multiplying the length of a ride in minutes times a measure of intensity of that ride. The different curves shown above above illustrate different ways of estimating intensity. Lucia, Edwards, and Banister TRIMP are well known and are well described in the literature. Gillen and Hughes are defined by me and thus essentially unknown. I defined Gillen intensity in a previous post, and Hughes intensity in this post. The point of this post is to argue that I the well known versions of TRIMP significantly underestimate the amount of fatigue generated by high intensity rides. (Note that the above scale is a log scale; the differences illustrated are quite large.)

“It Ain’t What You Don’t Know That Gets You Into Trouble. It’s What You Know for Sure That Just Ain’t So” - Anonymous

How does one estimate the amount of fatigue a workout generates? The standard metric used by many coaches and academics is a metric known as TRIMP, which stands for TRaining IMPulse, a term that means training load. As is well known, training load produces fatigue in the short term and, when combined with recovery, increases fitness in the long term. In this post, I will only be considering the fatigue impact, and in that context, TRIMP is also synonymous with fatigue.

TRIMP is not a single metric but rather a collection of different metrics. A TRIMP score is calculated by multiplying the minutes of exercise by the intensity of that exercise, which just kicks the can down the road: how does one determine intensity? The difference between the various TRIMP metrics comes from their use of different estimates of intensity. I wrote my previous post in this series, “Training Zones, Calories, Oxygen, and Power”, to provide the background needed to understand where estimates used by the more common versions of the TRIMP protocol come from; they come from the closely related metrics of heart rate, blood lactate, power, and the training zones derived from these metrics, all of which ultimately relate to calories burned per minute. In the absence of any information to the contrary, is it a reasonable guess that fatigue might be directly related to the rate at which calories are burned? Sure, why not? However, it is just as reasonable to guess that that it is not. What I am going to argue here is that there is information to the contrary, that the advice commonly given by coaches based on their real world experience provides a very different estimation of how fatigue relates to intensity than would be predicted by the amount of calories rides of different intensity consume.

How do the common versions of TRIMP estimate intensity? Edwards TRIMP is based on a heart rate-based five zone system and uses the zone number as the measure of intensity. Lucia TRIMP uses a blood lactate-based three zone system and again uses the zone number as the measure of intensity. Banister TRIMP does not use training zones but rather uses heart rate directly. In addition, it adds an exponential adjustment which reportedly was included to make it match lactate levels more closely. The effect of this correction is relatively small, however. There is also something called individualized TRIMP. I believe this represents a family of estimates with one source even using the term to to refer to Banister TRIMP^. The purpose of this post is for me to provide my own estimate of intensity which can be used in my own version of TRIMP, an estimate based on the actual training plans provided by Coach John Hughes.

This is not my first attempt to provide a different measure of Intensity. My first attempt was based on the paper I refer to as Gillen et al. This estimate was based on a 7 zone system, and I suggested that Zone 7 produced not 3.5 times the fatigue of Zone 2 but 45 times as much, that the estimates of intensity for Zone 2 and Zone 7 should be not 2 and 7 but 1 and 45. I think that fatigue generation and intensity is most definitely more complicated than that, that there may not even be a single number that fully represents each zone, but in the interest of not allowing the best be the enemy of good, such a single number representation is what I will be developing in this post not because I think it is perfect but because I think it is better than the other more commonly used estimates. To put this into perspective, in my last post I essentially used a multiplier of 1 for all zones because I lacked the zone data to do better. Had I been able to use the zone number multiplier I am now disparaging, that would have been better than what I did. I think this is why coaches sometimes recommend a zone number multiplier, it is simple so that their athletes might actually do it and it is better than nothing. In that spirit, I think there is an even better multiplier that coaches could add to their training zone charts that would be, if not perfect, an improvement over zone number (and just as simple). In fact, I think that multiplier is implicit in their more detailed training advice, and what I am going to do in this post is to tease that out for one publication of one particular coach, the one coach I am currently following, Coach John Hughes. The main theme of this post is going to be to compare what Coach John Hughes recommends to what he would recommend if it were true that Intensity was proportional to Training Zone Number (e.g. Load = Minutes x Zone Number.)

Let’s imagine a healthy, young athlete who is a randonneur specializing in 200K brevets. Let’s imagine they select "Distance Cycling" by John Hughes and Dan Kehlenbach (hereafter referred to as Distance Cycling) as their training guide. This is the plan for preparing for a 200K brevet from Distance Cycling:

The numbers are the length of each day’s ride in minutes. The Green rides are ridden in Zone 1, the Yellow rides are ridden in Zone 2, and the Blue rides are ridden in Zone 3, but in what Zone should the red rides be ridden? To answer that question, our randonneur turns to another publication of Coach Hughes, “Intensity Training for Cyclists” (hereafter referred to as Intensity Training.) That book describes 6 training zones named Zone 1 through Zone 6. In addition to these six numbered Zones, it talks about 2 other zones named “Sweet Spot” and “Sprints”. Sweet Spot overlaps with the top of Zone 3 and the bottom of Zone 4 and Sprints are even more intense than Zone 6, they are a Zone 7 if you will. (This last point has confused me in the past so in some of my earlier posts I refer to Zone 6 when I should have referred to the Sprint zone, Zone 7.) The imperfect but (hopefully) useful approach I will take is to look at how long the various workouts recommended by Coach Hughes are and from that, infer how much Fatigue per minute ridden Coach Hughes thinks are produced in each zone. There are some leaps in logic required to do that, and I will take you through those. To do so will require knowing a bit more about Coach Hughes’ training plan.

Intensity Training describes a periodized training plan consisting of fairly typical divisions into Pre-Season, Base, Build, and Main Season periods. The training plan diagrammed above describes the Build period which is what I will be focusing on in this post. This book is designed to be flexible, to adjust to a variety of riders and goals. Our hypothetical randonneur has the ambition of riding a 200K brevet as fast as possible and so uses the Coach Hughes’ “Performance Rider” plan which includes rides in all 8 zones. All rides between Sweet Spot and Zone 6 are done one day of the week, on the “red” day. Sprints (Zone 7) are interspersed within other rides, on any day except for rest (no ride) or active recovery (“green,” Zone 1) days. Rides in different training zones are designed to develop different cycling abilities. Which intensities your hypothetical randonneur will ride during their weekly “red” ride will depend on what abilities they are attempting to improve. Those abilities (along with the maximum recommended total time for each workout) are as as follows:

Sweet Spot: Increase Power Longest Workout: 40 minutes

Zone 4: Increase Lactate Threshold Longest Workout: 30 minutes

Zone 5: Increase Racing Speed Longest Workout: 20 minutes

Zone 6: Increase VO2max Longest Workout: 15 minutes

Zone 7: Improve Economy Longest Workout: 2.5 minutes

When our randonneur starts doing these higher intensity workouts, Coach Hughes conventionally has them start with fewer, shorter repeats and work up to more, longer repeats. the length of the workout is (the number of repeats) x (the length in minutes of each repeat). For each zone, he has a maximum number of minutes that an athlete reaches at the end of that progression. Since these Zones are swapped in and out of the same ("red") day in the schedule, one might infer that the maximum minutes, which is different for each zone, represents the same training load. Since Load = Intensity x Minutes, one can infer the relative Intensity by dividing the constant Load by the variable Minutes (e.g. Intensity = Load/Minutes). But is it true that all of these zones have an equivalent load? Here is what Coach Hughes says:

“The harder the intensity, the more days of recovery you need between sessions. You may do two days of tempo workouts in a row if you can do a quality workout the second day. Allow at least one recovery day between sweet spot workouts and at least two days between sub-threshold, super-threshold, VO2 max and sprint workouts.” - Coach Hughes, in Intensity Training

Thus, taking Hughes at his word, it takes twice as long to recover from a Sweet Spot workout as it does from a Zone 3 workout and three times as long from Zones 4 through 7. The good news is that, by inference, the Zone 4 through 7 workouts each adds up to the same training load. The difference in recovery times between Sweet Spot (Zone 3.5, if you will) and Zone 4 is small and I will ignore it. (If I did include it, it would only increase the already large trend I am suggesting.)

What about Zones 1, 2, and 3? Zone 1 is only used for recovery rides, the goal of these rides is not to increase fatigue but to reduce it. The bad news is that means my approach cannot be used to estimate the fatigue generated by Zone 1 recovery rides. The good news is that there is no need to to do so, Zone 1 rides can be ignored as a source of fatigue.

The Zone 3 ("blue") ride occupies a different slot in Coach Hughes training plan than the higher intensity ("red") rides so there is no reason to expect that it will generate the same amount of Fatigue as they do. The one clue we have is Coach Hughes statement that Zone 3 rides can be ridden two days in a row whereas the higher intensity rides require two to three days recovery between them. From that, we might conclude that the higher intensity ("red") rides generate two to three times the total fatigue as the Zone 3 ("blue") ride. The longest Zone 3 ride, both in Distance Cycling and in Intensity Training, is 90 minutes. If I were to argue that these 90 minutes generated only half the fatigue as the 40 minutes of total Sweet Spot (Zone 3.5, "red") ride, then I would have to conclude that the Intensity (Fatigue per Minute) of the Sweet Spot ride was 90 divided by 40 time 2 = 4.5 times as that of the Zone 3 ride. Compare this to the conventional TRIMP estimates that they are at most 1.25 times greater. At this point, I want to reiterate that I am aware of how tentative my argument is. I feel very strongly that the conventional TRIMP estimates significantly underestimate the Intensity of higher intensity rides but am much less sure exactly how much they do so. Thus, to be conservative, I am ignoring the two-fold multiplier and suggesting that a Sweet Spot ride has 2.25 times the Intensity as a Zone 3 ride.

The weakest link in my argument concerns the relative Intensity of Zone 2 (long, "yellow") rides and the higher Intensity rides. Again, they occupy a separate spot in Coach Hughes training plan so there is no basis for assuming they generate the same amount of Fatigue as the higher Intensity rides. I don’t know how to fix that so won’t try; for no good reason, I will assume that the long (“yellow”) ride generates the same amount of total Fatigue (Intensity x Time) as the Zone 3 (“blue”) and higher intensity (“red”) rides. The longest Zone 2 training ride Hughes recommends is 210 minutes.

The intensity I am calculating is relative. For convenience, I set the intensity of a Zone 2 ride equal to 1* and for each of the higher zones, the intensity given is how much harder that ride is per minute than a Zone 2 ride. Thus, for each zone, I calculate the intensity as the length of the longest ride at that intensity divided by the length of the longest Zone 2 ride and this is the results of that calculation:

In the first column is the training zone number. I have never seen a case where rides in Zone 1 are used to build fitness, and as noted above, this means I will not be using Zone 1 in my estimation, it will start with Zone 2. In the next column, I have given the relative Intensity implied by the recommendation that training load (which by definition equals Time multiplied by Intensity) be determined by multiplying ride time by zone number. It is useful to arbitrarily set Zone 2 to have an Intensity of 1 and thus Zone 4 will have a relative intensity of 2 and so on. That implied intensity is given in the next column, Zone Intensity. In the third column, named Hughes Minutes is the maximum number of total minutes in a workout (day) Coach Hughes suggests for each zone. Using my logic, I then convert this into a relative implied Intensity by arbitrarily setting Zone 2 to an intensity of 1 and then multiplying that by the ratio of minutes in Zone 2 / minutes in the Zone. Thus, for Zone 4, Hughes recommends a maximum of 30 minutes. In his overall training plan to prepare for a 200 kilometer long ride, he suggests a maximum ride length of Zone 2 rides of 400 minutes. 400 divided by 30 gives an intensity relative to Zone 2 of 13.3, much higher than the zone-number based estimate of 2. In a previous post, I used the data from Gillen et al. to do a similar estimate, and, for comparative purposes, that is shown in the final column. Gillen et al. only looked at Zone 2 and Zone 7 and so I put n.d. In the remaining positions of the table to indicate that the value was Not Determined.

Am I guilty of the straw-man fallacy? Does anyone actually estimate ride load by multiplying zone number times minutes? Every single scientific paper I have read that considers ride intensity uses one of the common versions of TRIMP to estimate that intensity. In a perfect world, those papers would have justified use of that metric, but in my opinion, they do not. Some studies do go as far as to show that TRIMP scores are going in the right direction and are better than nothing, points I do not dispute, but then go on to use them in a way that relies on them being quantitatively accurate which has not been demonstrated and which I believe is not true. As just one example, consider the publication, Vermeire et al. I reviewed about a year ago. That publication concluded that polarization of training was more important than training volume because they found that improvements in performance correlated with degree of polarization but not with TRIMP scores. (They looked at Banister, Edwards, Lucia, and individualized TRIMP.) Perhaps there would have been a correlation with TRIMP scores had they used a more quantitatively correct version of TRIMP.

In contrast, coaches give TRIMP very little if any attention. Rather, they provide concrete training suggestions, how long an intense effort should last (20 seconds, 1 minute, 10 minutes...) and how many times that effort should be repeated. What I am arguing for is to connect the scientific community with the wisdom of coaches. This is an approach that Dr. Seiler (father of periodized training) has adopted. He argues that laboratory studies are limited in what information they can provide and thus need to be supplemented with studies of the training approaches used by successful athletes and their coaches.

If I were reviewing this post, my biggest complaint would be the lack of any experimental evidence that my approach is helpful. My response is to concede the point but then to note that this post is not intended as proof for anything but rather as a reality check and suggestion for future research. If the actual recommendations of coaches (the recommendations of Coach Hughes I used in this post are pretty typical) do not match the TRIMP protocols we are using, should we not worry about that? In short, I think coaches do not need improvements to TRIMP but rather can provide suggestions as to how to improve it. Exercise research scientists, on the other hand, often use TRIMP and thus would benefit from the improved versions of TRIMP that coaches can provide, based on their experience.

^ I confess that I understand individualized TRIMP least well of all of these and if you feel like you do understand it, please tell me about it in the comments.

* Because of the way Banister TRIMP is calculated, and because that calculation generates a value of 0.9 for Zone 2, a value very close to 1.0, I didn't bother to correct Banister TRIMP numbers.

Thursday, July 15, 2021

What Do I Do Now?

[Next post I will return to my series of posts on the theory of Fatigue and Training Intensity. I interrupted that series both because I thought those remaining posts could benefit from more time and because I wanted to address my failure to prepare for a May metric century this year.]

Last post, I developed some tools for looking at my training data. I identified four similar routes that I ride frequently such that my average speed when I ride any one of these routes can be used interchangeably to assess my Form (my ability to ride fast and long which is increased by an improvement in my Fitness and decreased by a buildup of Fatigue.) I refer to rides on these four routes as Alpine-Like rides. For the purposes of this post, I assume my average speed on Alpine-Like rides is a measure of my Form. This may be a bit of an oversimplification but I believe it to be a useful approximation.

In my previous post I also developed some statistical techniques for more objectively examining ride data. In that post, I bemoaned my inability to prepare for The Art of Survival, a group ride I had hoped to attend last May. To understand that failure, I focused on my rides in the months before and used statistics to try to determine if my failure was truly due to a lack of Form (it was.) In this post I am zooming out to look at all the Alpine-Like rides I have completed since moving to California in September of 2017 to see what they can tell me about how I have been training and the impact of that on my Form. The graph at the top of this post shows my average speed on all 231 such rides completed between October 15, 2017 and May 31, 2021.

If you squint hard enough at that graph some trends might be apparent. What is crystal clear, however, is that there is a huge amount of day to day variability and the rides are very unevenly spaced over time. For example, note the very dense cluster of rides during the middle of 2018. It seems I really liked doing Alpine-Like rides back then! The latter has the potential to bias any statistical analysis by giving too much weight to that one time period. Primarily to correct that bias and secondarily to smooth the data a bit I have taken to grouping the data by month. All of the Alpine-Like rides in a given month are averaged and treated as one datapoint. The results of doing that are shown in the blue line in the next graph:

The red line on the graph is a running average of 3 months centered over each month to further smooth the data. When normalized and smoothed in this way, trends become much more apparent. It appears there was a slow increase in my Form after I got to California, but that towards the end of 2018, there was a more rapid decrease. Then, my Form started increasing around May of 2019, an increase that continued until the beginning of 2020 when my Form reached an all time high. That high was so dramatic that it caused me to (incorrectly) speculate that it was due to changes that my local bike shop made to my bicycle, a speculation about which I blogged. That rapid increase in Form was followed by an even more rapid decrease, then another increase followed by a decrease leading to the plateau of low Form that kept me from riding The Art of Survival last May, the plateau that was the subject of my last post.

Except for the calculation of averages, I have not used any statistics up to this point, all my conclusions are based on subjective eyeballing of the data. To a large extent that is because I am not sure how to apply statistics in this case. One significant complication in doing so is that I have looked at the data too much to be able to do a valid statistical analysis of it. This is a very counterintuitive fact about statistics to which many students object but which all professional statisticians agree is true. In order for a statistical analysis to be valid, you have to frame any hypotheses you want to test before ever looking at the data. You can divide the data in half, look at one half, develop hypotheses and then test them on the second half just so long as you don’t revise your hypotheses based on what you see in the second half of the data. What is so counterintuitive about this rule is that the apparently identical analysis done on the apparently identical first and second halves of the data yield invalid and valid results respectively. This turns out to be a consequence of the multiple testing problem I referred to in my last post. If you test 20 hypotheses all of which are wrong, on average one of them will test as statistically significant at the traditional P ≤ 0.05 level. That is, in fact, what P ≤ 0.05 means, there is a one in twenty (5%) chance that the observation is due to random chance. When one looks at a dataset, one's brain is rapidly testing an uncountable number of hypotheses and so will, using the subjective eyeball approach, find a few that look significant but are just due to random chance. You cannot even correct for this because you have no idea how many tests your subconscious brain did. The good news is that there is no reason that those same chance fluctuations will be present in the second dataset at which you have not looked. Unfortunately, I have no second dataset so all of my statistical analyses are suspect. That said, I feel like I am better off doing them than not so long as I do not overestimate my certainty. To further help minimize chance associations, I have started to use systematic analytical approaches to minimize the amount of cherry picking I am doing. One such systematic approach is to always analyze my data by calendar month rather than doing what I did in my last post, selecting an arbitrary group of rides that looked low and then testing that visually identified set with statistics.

That arbitrary calendar month grouping is far from a perfect solution, a month suffers from being both too long a time interval and also too short. It is too long because a lot can happen in a month, interesting transitions can be lost because of where they fall in the calendar. It is too short because some months don't have enough Alpine-Like rides to be statistically significant, a fact that will come up in the analyses below. However, it was the best solution I could think of to remove some of the subjectivity of my analyses.

To take advantage of my monthly summary data I used a 1 sample T-test to ask for each of the months I have been in California if my average speed on Alpine-Like rides for that month was significantly different than my overall average speed of 12.26 MPH. As I noted in my last post, this suffers from the multiple test problem. If you do enough comparisons, you will see "statistically significant" differences that are due just to chance. For that reason I will be using corrections to avoid that.

Since moving to California, I have ridden for 44 months. I decided that I wanted to have at least 4 rides in a month to compare its average speed to the overall average. When I removed months with fewer than 4 rides, there were 27 months left to compare and 5 of those were significantly slower or faster than average. The uncorrected probabilities that these differences are due to chance are P=0.00051, P=0.00173, P=0.00259, P=0.01265, and P=0.01298. I used the Holmes correction from The Primer of Biostatistics, the book I mentioned in my last post, to correct for the 27 comparisons I did to find those three apparently significant ones. When I did that, only the two most significant differences (one faster, one slower) remained significant at the P=0.05 level. The third was significant at the P=0.06 level, I cannot be 95% sure it is real but I can be 94% sure. For the last two I can be 75% sure they are real, more likely than not they are but they need to be taken with a grain of salt. There are other reasons for thinking that these 5 are all real (they are found in parts of the graph where the surrounding months are similarly fast or slow, for example) and similarly, it is almost certainly true that there are additional months that are significantly faster or slower as well but both noise in the data and lack of a sufficient number of observations prevent them from reaching statistical significance.

Interestingly, only one of the five months I found was faster than average. Based on my eyeballing of the data, that surprised me. When I looked at months that I expected to be faster, I found that, in most cases, the reason they were not found is that they contained fewer than 4 Alpine-Like rides. This suggested to me that fast rides might be due more to reduction in Fatigue from less riding than increase in Fitness due to more. However, just because I did fewer Alpine-Like rides doesn't mean I did less riding. There are many other routes I ride (though none of them often enough to be used to assess Form) so I added minutes ridden per month on all rides to my monthly summary data and plotted that against ride speed with the significantly faster or slower rides flagged:

The blue line is my average ride speed for the month, the dotted red line is my total minutes of riding for the month on all routes, and the significantly faster or slower months are flagged with a yellow dot. To my eye, there is no relationship whatsoever between minutes ridden and speed. Of course, minutes ridden is a massive oversimplification of training load; a minute in Training Zone 6 is entirely different from a minute in Training Zone 1 (6 is hard and fast, 1 is slow and easy.) I tag my rides as Easy, Pace/Long, and Brisk, and considered giving them different weights based on that but decided that was much too subjective and in the end I would just be playing with the data to get the answer I wanted. If I want to go to that level of sophistication, I think I would need to start riding with a heart rate monitor again. The reason I was willing to do the analysis that I did, assigning all rides equal weight, is that I am not specifically doing interval training at present, I ride all my rides at a similar more or less comfortable pace, so assigning them the same weight might make some sense. I am not totally comfortable with this argument, I think hillier rides leave me more tired and I have considered doing some sort of “feet of climbing” correction. Also, recovery rides I do on my trainer are, by design, much easier, though subjectively I feel that they do produce some Fatigue. However, for the moment, I think equal weight is the best I can do.

The last graph is similar to the previous one except rather than flagging significantly fast or slow rides, I flagged the metric centuries for which I prepared:

The red dots are The Art of Survival, the gold dots are Golden Hills, the blue dot is a solo metric century I did here on the peninsula, and the green dot is not a metric century but flags the month in which I had my all time fastest Alpine-Like ride, ridden at 14.1 MPH. The most recent red dot flags is the only metric century for which I prepared but did not ride, the 2021 running of The Art of Survival. This figure strongly supports my decision not to attend that ride, my Form was at an all time low, much lower than for any of the metric centuries I did ride. Again, minutes of training doesn't appear to have anything to do with Form at the time of the ride. Things I do note by eyeball are that both times I rode Art of Survival and Golden Hills, my Form at the time of the Golden Hills ride was better. During the 2019 season, this trend seems to have continued. One month after the Golden Hills metric century I rode a solo metric century and my Form was even better. An accidental observation is that three months after that solo metric century, I ended up riding my fastest ever Alpine-Like ride.

I am at a loss to explain any of my observations above and in fact worry that there is a chicken and egg confusion in my thinking. Suppose my training does not determine my Form, but rather, my Form determines my training? Maybe when my Form is good I feel good and I ride harder. But then what is determining my Form? I confess I haven't a clue. I would like nothing better than to have a repeat of my 2019 season (even though my Art of Survival that year was utter misery) but haven't a clue how to do that. Maybe it is just out of my hands. Maybe I just have to take my Form as it comes, relax when it is low, and go for it when it is high. If so, that would again validate my decision to skip The Art of Survival this year. Still, I somehow have to decide what my weekly ride schedule should be. Last post, I mentioned that when I looked back at my subjective description of how I felt it didn't seem to be of much use, it didn't seem to correlate with Form or anything else. Maybe the problem is not with my subjective sense of Fatigue but how I record it. Maybe what I am doing is just fine, riding harder when I feel better and easier when I don't.

One change I am making, at least for the moment, is to ride a bit less in general and to relax what had been my fierce determination to ride at least 300 minutes a week and at least 4 rides a week. (obviously this is based on the assumption that I am riding too much rather than too little, an assumption I can neither justify nor refute, but which my "gut" tells me is true.) One thought that keeps coming back to me is that I am under-appreciating the impact that hills have on my training load. When I moved to California from Texas in 2017, my rides became much hillier. When I moved from San Carlos to Emerald Hills in 2020, they became even hillier. Where did I come up with the idea that I should always ride at least 300 minutes a week? The medical community recommends 300 minutes of Moderate aerobic exercise a week or 150 minutes of Vigorous aerobic exercise a week. The definitions of “Moderate” and “Vigorous” are many and varied. Based on those, I had been assuming that my rides represent a Moderate intensity of exercise. I have recently been reconsidering that and wondering if Vigorous intensity is a better description of my rides. More than that, there are the results in the paper Gillen et al. that I refer to so often. It argues that High Intensity Interval Training counts much more than even Vigorous exercise; that 6 or 7 minutes a week of all out sprinting would be enough to satisfy my medical needs. During the course of many of my rides (including the Alpine-Like rides) there are hills that really leave me panting. These climbs are probably less than the all out sprint evaluated by Gillen et al. but they are way beyond Vigorous. Thus, although it is hard to be sure exactly how to count my rides against the Medical recommendation, I am comfortable about relaxing the 300 minutes a week I had been trying for. As for the 4 ride a week recommendation of the coaches, to discuss all the reasons for reconsidering that would be a post in and of itself, but for many reasons, I am comfortable relaxing that minimum requirement as well. Will this reduction in riding help or hurt? Stay tuned to find out.

Tuesday, June 22, 2021

Apologies to Those Who Comment

Probably none of you are still here, probably none of you will ever see this, and yet I must apologize. I am sorry the comment you so generously wrote in response to one of my blog posts never appeared below that post, that it seemed to vanish into the blogosphere. That was not intentional, I promise you! The good news, such as it is, is that your comment has finally made it onto my blog.

When I started blogging in 2012, when someone would make a comment in response to one of my posts, I would get an email. I could decide to approve the comment which would then appear on my blog or not approve it and it would not. Mostly I approved comments, rejecting only those that clearly were advertisements having nothing to do with my post. I never got very many comments so when, sometime in 2018, I stopped getting them at all, I didn't notice.

Recently, I had an unrelated issue with Blogger. A draft post that I had been working on for months simply vanished one day and nothing I could do would bring it back. I searched the Internet for a way to get my content back and found that this was a known, replicable bug that Google (who maintains Blogger) is apparently ignoring. All of a sudden I became very concerned about protecting my content. In the past, there had been a way of backing up an entire blog with one click and and went looking for it and found that, apparently, that got silently removed. What I ended up doing to backup my blog is to go through post by post, all 200+ posts, and print them as PDF files. (I had many fewer drafts and saved those by cutting and pasting them into a file on my desktop computer.) However, before I gave up and took that approach, I went through every option and setting in Blogger looking for that missing backup capability, and in so doing, came across an option I had never seen before, an option called "Comments." Clicking on it, I found nine or ten comments on various posts, sitting in limbo, waiting for my approval. With no notification whatsoever, Google had stopped sending me emails so I never knew to look. I immediately approved them all, but for some of you, that was three years late. Sorry!

I have a love/hate relationship with Google (more love than hate.) When, in terror of losing my blog's content, to where did I paste my draft blog posts? To Google Docs, that's where. More than that, my entire digital history is backed up onto Google Drive. The biggest problem with Google software is that it is free. At that price, what right to I have to complain about anything? (I do pay for my Google Drive backup space.) Several weeks ago, I was fuming in response to another Blogger problem and ran across a post advising all future bloggers to avoid Google's Blogger software in favor of WordPress, a for-money product that the poster claimed was more reliable, feature-rich, and credible. A few years ago I had the opportunity of working with WordPress and think, if I could set my Wayback Machine to 2012 when I started my blog, I might advise myself to pay the money and take that approach. (Switching now would be much more problematic.) Realizing the absurdity of pasting my draft content into Google Docs to protect myself from the flaws in Google Blogger, I looked to see what it would cost me to go back to Microsoft Word and I have started to wonder if I should be taking another look at Apple Cloud for backup. But what assurance do I have that Apple or Microsoft or WordPress wouldn't let me down as well?

This post is meant to be an apology to those kind folks to commented on my blog only to have their comments (accidentally) ignored, not an assault on Google. There is no such thing as perfection in this world and certainly not in computerverse. I have been involved with computers since 1980 and for most of that time have felt that too much attention was being paid to feature novelty and not enough to stability and reliability. About twenty years ago, the IT staff at the university where I worked begged me to move all my emails onto their system, promising me they would preserve them forever. A few years later, at the advice of university attorneys, they deleted all but the last three years of my emails. When I complained bitterly, they merely shrugged. So Google is far from unique in struggling with these issues. Nonetheless, I continue to be frustrated with problems which, though not unique to Google, are problems to which Google is not immune. But mostly I wanted to explain to you, kind commenter, what went wrong.

Monday, June 7, 2021

What Is Truth?

Ride speed on my New Alpine and New Alpine Cañada routes since my move to Emerald Hills. Ride speeds are in blue. The curve in red is a moving average of 8 rides centered around each time point.

A recurring theme on this blog is the impact of long term fatigue on my cycling. Back in 2012, 2013, and 2014, I was attempting to be a randonneur, a long distance cyclist who does rides of 200 to 1200 kilometers (km). During those years, I only managed two such rides, both at the shortest 200 km distance, and the biggest factor in my failure to complete more was long term fatigue. When I moved to California three and a half years ago, I decided to switch to shorter rides, 100 km rides known as metric centuries. My hope was that, unlike with the longer 200 km rides, I could do the 100 km rides more frequently, that as compared to the long term fatigue generated by training for and riding the 200 km rides, fatigue would be less of an issue with the 100 km rides. That has definitely been the case. However, less of an issue and not an issue are not the same thing. Although reduced, I believe that long term fatigue is still an issue for my 100 km rides. Or is it? Was fatigue ever the issue I thought it was? As a scientist, I always question my assumptions, there is a lot to question, and as much as I have discussed long term fatigue I still have a lot of uncertainty about it, an uncertainty that recently manifest itself yet again.

Let's start with some definitions: Form = Fitness - Fatigue. Form is what determines how fast I can potentially complete a ride, how long a ride I can complete, etc. If I haven't been training, my Fitness will be low and I will not be able to ride very quickly or very far. On the other hand, I might be very fit but completely exhausted from the training required to develop that Fitness, as a result my Fatigue is high, and so again, my Form and thus speed on a ride will be low. Also, factors other than cycling can cause Fatigue, other kinds of exercise (e.g. yard work), emotional stress, lack of sleep, illness, etc. Finally, this is not a quantitative equation. The biological mechanisms underlying Form, Fitness, and Fatigue are far from completely understood and there are no generally agreed-upon ways of measuring them and thus no units of Form, Fitness, or Fatigue. Thus, Form = Fitness - Fatigue is a conceptual equation. This post is about Form. To what extent my Form at any moment in time is due to Fitness or due to Fatigue has been and will be discussed elsewhere.

What inspired this post? As the COVID-19 pandemic has started to wane, I had hoped to restart my metric century group rides last month by riding the Art of Survival on May 29. However, I was not able to do that. I tried to replicate the very successful training plan I had developed for the 2019 Golden Hills metric century which includes a weekly 33 mile ride in the months before the event and then a 44 mile ride 4 weeks before and a 54 mile ride 2 weeks before. I had been successful at doing the 33 mile ride almost every week and completed the 44 mile ride on schedule, but there were warning signs. The two observations I have been able to use to measure my long term fatigue are how fast I ride and how I feel. By "how I feel" I mean do my legs feel sore and tired? Do I feel generally lethargic? Am I more grumpy than usual, do little things bother me more than they should? Am I unenthusiastic about starting a ride and do I find it an unpleasant slog once I go? As I attempted to train for the Art of Survival, that is how I felt. However, when I looked back at my subjective fatigue data, I found it unconvincing. I think I am just a pessimist, and it seems that my evaluation "how I felt" during the vast majority of my rides can be summarized as "I felt bad." Maybe I'm just old. Bill Clinton famously said that, after age 50, if you wake up in the morning and nothing hurts, you know you died during the night.

The second way I evaluate my readiness for a challenging ride is how fast I am riding. Currently, there are two routes I ride regularly enough that I can use them to assess my speed, named "New Alpine" and "New Alpine-Cañada". I felt like my speed on these rides were slow. The central question of this post is, is this feeling of slowness because the rides were really slow, or was it a subjective illusion?

The New Alpine and New Alpine-Cañada routes are similar to each other and similar to routes I rode back in the Fall of 2019 (routes named "Alpine" and "Alpine-Cañada"), a time when I was riding well. and thus I might be able to compare speeds between now and then. But is not the speed of a ride dependent on how fast I choose to ride as much as my fatigue level? To some extent, yes. (I have blogged about that.) That said, I firmly believe that the speed at which I complete these rides is governed much more by my Form than by a conscious decision; I tend to ride them at a relatively constant subjective feel, not holding back but at a speed I feel I could maintain indefinitely. Thus, when I notice that, over several rides, the speed of these rides is lower than average, I take note. Unfortunately, this is still a subjective assessment. Yes, the speed of a ride is objective but what I consider fast and what I consider slow is subjective as is how many slow rides it takes before I conclude I am suffering from fatigue. I wanted to use statistical analysis to make this assessment less subjective and more quantitative, and that is the subject of this post.

I think I have a pretty good understanding of the theory of statistics but I am not a statistician; I lack the years of experience that made the statisticians I have worked with over the years so valuable. Also, the ride data I want to analyze is quite different in structure than the experimental data for which standard statistical tools were developed. Statistics is always as much of an art as it is a science and even the best of statistical approaches will not give informative results if the underlying data is not what it is assumed to be. With all those pitfalls in mind, I have taken a very redundant, conservative approach to my analysis, an approach using different tools to cross check I wasn't making a silly mistake and which I kept as close to first principles as I could so as to avoid the oh so common errors resulting from plugging data into the wrong formula or algorithm. Finally, I would mention the importance of thinking carefully about exactly what question I am answering with any specific analysis.

I noted above that I thought I rode four of my routes at speeds that were, on average, the same. Here is the data on those four routes:

Back when I first started riding the Alpine and Alpine-Cañada routes, I predicted that, because the Alpine-Cañada route was longer it would, on average, be slower, but that has not been my experience. Subjectively, I noted that my average speed on the two routes tended to be similar. On the other hand, looking at the small changes in the routes caused by my move from San Carlos to Emerald Hills, I predicted that, in the long run, the average speeds on the new routes and the old would be very close to the same. However, what might obscure that similarity in the short term is that my Fatigue seems to have been high for a lot of the time since the move. That illustrates a general complication of the analysis I am trying to do. Even assuming that ride speed is a simple reflection of my Form, they will not be randomly distributed over time, fast rides and slow rides will cluster, something I needed to keep firmly in mind as I did my analysis.

I could try to assess my Form using just the data from one of the four above datasets but there would be significant advantages if I could use them interchangeably, and thus my first statistical task was an analysis of variance to determine if I am formally justified in doing so. To perform Analysis of Variance (ANOVA), I used the formulas provided in "Primer of Biostatistics, Sixth Edition" by Stanton A. Glantz. (In general, that book was my guide for most of the statistics in this post. Hereafter, I will refer to this book as "Primer of Biostatistics.") Rather than use the table of critical values in that book, I used the Google Sheets FDIST function to calculate the P-value that the rides from the four routes were from the same "group", e.g. have the same average speeds. The equations I used required that the four groups to be compared have the same number of samples. The Alpine-Cañada route I have ridden the fewest number of times, 29, so I had to take subsets of the other three so that I had four sets of 29 rides to compare. The normal way of doing that is to pick random samples. However, because of the data is clustered with respect to time, that is, the speed I ride on Monday tends to be correlated to the speed I ride on Wednesday much more than it is to the ride I did six months ago, I used matching instead. Since the set needing the most subsampling, the Alpine ride, is interleaved with the smallest set, the Alpine-Cañada ride, I picked subsamples from the Alpine ride one week before or after an Alpine-Cañada ride when possible, and when that didn't give enough subsamples, scattered the remainder as evenly over time as I could. In the case of the New Alpine and New Alpine-Cañada datasets, they were only slightly too large (30 and 33 samples) and I had reason to believe that the most recent rides were outliers. In addition, I figured there was some virtue in having the rides I selected as close in time to the rides on the other two routes as possible. For those reasons, I picked the oldest 29 rides for each of these two routes.

When I then did an ANOVA analysis on these four sets of 29 rides I determined that there was an approximately 60% chance they were all the same and a 40% chance that at least one was different. If I had been trying to prove that one of these rides was different (the most common use for this kind of analysis) I would have been disappointed that I failed the P < 0.05 test. Since I am hoping for the opposite, I should be happy, right? Well, as happy as I can be. In my opinion, there is no way to "prove" that rides on all four routes have the same average speed, I can only say I failed to prove otherwise. What my analysis does say is that there is no good reason, based on this data, to separate rides on these four routes from each other and thus provides justification for me to treat my ride speed on any of these four routes the same.

Does common sense agree with the above analysis? Common sense says that my average speed on these four rides cannot be exactly the same; after all, they are all different routes and those differences are almost certain to affect average speed. The question I really care about is this: "How big is the impact of route selection on average speed compared to the impact of Form?" Fortunately, statistics has a way to estimate the answer to that question, calculating confidence intervals.

Using the same four subset of rides as I used for ANOVA, there are six possible pairwise comparisons I could make; Alpine vs Alpine-Cañada, Alpine vs New Alpine, Alpine vs New Alpine-Cañada, Alpine-Cañada vs New Alpine-Cañada, Alpine-Cañada vs New Alpine, and New Alpine vs New Alpine-Cañada. However, it would be a mistake to blindly make all six comparisons. If comparisons are done with the usual P < 0.05 criterion, there is a one in twenty chance that any difference declared significant will, in fact, be due to chance. With two comparisons, because there are now two chances to get unlucky, there is an almost 10% chance at least one of the two will be positive and by the time all six comparisons were made, that 5% risk has risen to about 26%. There are ways of correcting for that, but these corrections reduces the sensitivity of the analysis and so the right thing to do is to make as few comparisons as is necessary to answer the question at hand. I decided to make only one comparison, New Alpine vs New Alpine-Cañada. The reason I picked that one is that I probably will never ride the Alpine and Alpine-Cañada routes again (because my rides are door to door and the location of my door has permanently changed with my move from San Carlos to Emerald Hills) and so going forward, what I really want to know is if I am justified in pooling my rides on those two different routes. The 29 rides I used from the set of rides taken on the New Alpine route have an average speed of 12.06 mph, compared to those taken on the longer New Alpine-Cañada route which have an average speed of 12.24 mph. The difference in those average speeds is 0.18 mph. When I revisit that a year from now when I have 50 or 60 rides on each route, is that difference likely to stay the same, get larger, or get smaller? How close is 12.06 mph to the true average speed I would see on the New Alpine route if I rode it many more times? Using the approach outlined in Chapter 6 of Primer of Biostatistics, I calculated that it is 95% certain that the real difference in average speed over these two routes is between -0.01 and +0.47 mph. That is, the New Alpine route may even be a bit faster than the New Alpine-Cañada route, but is not likely to be more than 0.47 mph slower. Without belaboring the point, this suggests that any difference in speed between these two routes is unlikely to confound my attempts to determine my current Form using a random mixture of rides on them. Going forward, I will use rides on all four routes interchangeably and refer to them as the Alpine-Like routes and rides on those routes as Alpine-Like rides.

There is one more approach I would like to introduce before asking the question that inspired this blog post. Given the assumption (tested above) that speeds on all four of the Alpine-related routes can be used interchangeably, I have a total of 230 rides ridden over more than 40 months. This number is so large that I am going to consider it a statistical universe which allows me to use a different kind of test to ask questions like "Have my slow speeds over the last two months been slower than expected by chance?" That test is the one sample T-test, which determines if a set of measurements matches a known value. For example, given the above analysis, I am now claiming that I know that my average speed on any mixture of the Alpine routes is 12.26 mph. Using this approach, I don't have to take a small subset of those rides to compare to a small number of recent rides, I can compare those recent rides to the mean determined from the entire dataset. Unfortunately, Primer of Biostatistics does not include this version of the t-test, so I found an online calculator to do that for me:

https://www.omnicalculator.com/statistics/t-test.

I then recalculated using tools provided by Google Sheets as outlined in the following website:

https://toptipbio.com/one-sample-t-test-excel/

Good news, the two approaches gave the same answer. The only thing left to do is decide which of my recent rides I should compare to that known average.

Visualization is always a good place to start an analysis, and so finally the graph at the top of this post becomes relevant. It displays the speed I rode an Alpine-Like ride for all the rides since moving to my new home in Emerald Hills. The actual ride speeds are in blue. The line in red is a running average of 8 of those ride speeds centered on each of those data points. My hypothesis based on that graph is as follows: During September and October of 2020 my speeds were increasing due to improved Fitness, that during November and December my speeds were decreasing due to a buildup of Fatigue, and since then my speeds have remained low because I failed to alter my training to allow me to recover from that Fatigue. In this post I will not attempt to determine if it is actually a buildup of Fatigue that caused my rides to be slow rather than lack of Fitness, I will simply ask: Are my recent rides truly slow or did I just have a few slow rides due to chance? There are statistical approaches for analyzing the rate at which ride speeds are increasing or decreasing, but I will not attempt to develop those for this post. That being the case, the easiest (and most relevant) rides to test are those rides that I am claiming were ridden at a unchanging low speed due to Fatigue, the rides between the beginning of February of 2021 and the middle of May. There are 24 rides between those dates with an average speed of 11.88±.46 mph. Using the one sample T-test, the chance that this is the same as the 12.26 mph average speed of all 230 Alpine-Like rides is ~0.004%, virtually non-existent. My recent rides have truly been slower than average.

How much slower have my recent rides been? The Standard Deviation for the 24 recent rides is 0.46 from which I can calculate that the Standard Error of the Mean (SEM) of those rides is 0.093 and so I can be 95% sure my true average speed between those dates was between 11.8 and 12.0 mph. Although statistically different, that is not very different in magnitude from my all time average of 12.26 mph. Sure, my ride speeds were really slower, but were they enough slower to matter? That is not a question of statistics, it is a question of biology and exercise science.

The biological rather than statistical significance of my slow rides is not a question for statistician but for an exercise physiologist (scientist) or a coach. I am unaware of any guidance from any scientist or coach on this precise question, and besides, every athlete is different and I am very different than the athletes scientists and coaches usually discuss, so I will attempt to answer this question myself. Let me start by asking what was the actual event that finally decided me I should not attempt to ride the Art of Survival this year? It was that I did not complete the training plan I had developed to get ready for this ride. Specifically, I failed to complete the last, 55 mile long training ride. This is similar to the reasons I failed to complete 200K brevets back between 2012 and 2014, it was not that I attempted the brevet and gave up along the way, it was as I approached the end of my training plan, I did not complete the longest rides in those plans. I have previously discussed ad nauseum why that might have happened and won't repeat that here, I will just take it as a fact. Back then, I noted that relatively small changes in my MAF test rides seemed to predict success or failure in preparing for a brevet. What I have accomplished in this post is to develop a California replacement for the MAF test. It is not that a 0.4 mph decrease in the speed at which I would have ridden the Art of Survival would have been the difference between success and failure, that would have only made a 5 hour ride less than 5 minutes longer, a matter of no consequence. It was that this decrease in speed on my standard rides is an indicator of the level of my Form. Attempting a physically challenging ride with such poor Form would, in my opinion, have been unwise. So yes, I believe that the relatively small decrease in average speed I have been riding recently is important, not in its own right, but as an indicator of my overall wellbeing.

Am I guilty of attacking a gnat with a sledge hammer? Have I belabored an obvious point? I don't think so. I have been eyeballing ride speed as an indicator of Form since I restarted cycling back in 2008 even though I knew that I might be deceiving myself. My tracking got even shakier when I moved to California, lost access to the Rice Track and MAF tests, and stopped using a heart rate monitor. Coaches recommend riding a test ride every month or so to access Form. The problem with that is that it is a single ride. Both intuition ("it was just a bad day") and statistics council us on the folly of basing a conclusion on one of anything. Back in Houston, I liked that the many MAF tests I rode for training gave me a statistically robust indicator of Form and by combining four of my most common rides I can now replicate that to some extent here in California. And yet, will all that, I was still eyeballing. This post is one of my iceberg posts where only a tiny fraction of the effort I put into it shows above the surface. I recently posted how I had made a copy of my training log in a relational computer database. I could not have done the analysis for this post without that database. I had to relearn (and in some cases learn) the statistics I needed to do this analysis. I tried many different approaches to analyzing the data as my originally fuzzy thinking about the questions I was asking became clearer. And finally, in the process of asking one specific question about my recent rides, I assembled statistical tools that will make it easier to use objective statistical analysis in place of eyeballing going forward. I may never figure out why I felt fatigued in the runup to Art of Survival 2021 or know if skipping it was the right decision, but at least now, one piece of that puzzle is real and not imaginary. As I have said many times before, I blog because it is fun, I do not deceive myself that it is any substitute for time on the bike, and it never is. I have never forgone even a single ride to work on this blog. And yet, I take satisfaction in knowing I am one small step closer to understanding why my cycling doesn't always go the way I'd like it to.

Tuesday, May 4, 2021

Training Zones, Calories, Oxygen, and Power

From "Intensity Training 2016" by John Hughes, modified from Allen, Hunger and Andrew Coggan, Ph. D. (2006) "Training and Racing with a Power Meter." VeloPress, Boulder, CO.

Some comments on Coach Hughes' training Zones:
1) Sweet Spot is not an independent training zone, but rather an alternative to parts of Zones 3 and 4.
2) This is a practical, not theoretical set of zones. In that context, this is a 7 zone system with "Sprints" being Zone 7. Because it is difficult and unnecessary to measure heart rate (and to a lesser extent power) during an all out sprint, Hughes does not give metrics for this zone.

The main purpose of training zones is to help coaches communicate levels of intensity to their athletes. The coach has a training plan that they believe will maximally benefit their athlete given that athlete's strengths, weaknesses, schedule, and goals, and that it is just a matter of getting their athlete to execute that plan. In this context, all that is required is that the zones are sufficiently fine grained so that specification of a zone adequately limits the variability in the athlete's intensity, that from ride to ride a zone specifies the same level of intensity, and that there is some practical way for the athlete to measure what zone they are in during their training. Perhaps the two most common ways of accomplishing this last goal are by using heart rate (HR) or by using power output (measured by a power meter attached to the bicycle.) Relative Perceived Exertion (RPI) is a third metric than can be used as well.

The value of the metrics defining training zones are not the same from athlete to athlete. Zone 2 might run from an HR of 130 to 140 for one athlete and from 100 to 120 for another. It is the job of the coach to locate those boundaries for each of their athletes. To do that, coaches sometimes use metrics in addition to heart rate, power, and RPI: VO2 (sometimes including VCO2) and blood lactate are two common examples. (I will be explaining these metrics below.) The idea is that the coaches are looking for specific physiological boundaries that can occur at different relative heart rates or power outputs in different athletes, and these other metrics can help them find those boundaries. Once found, such boundaries can be associated with specific values of HR or power which then can be used by their athletes to track the intensity at which they are riding.

I have no complaint with any of the above. Where I depart from common practice is that sometimes coaches try to use training zones to monitor training load (accumulated fatigue) of their athletes. For example, some training books suggest that training load = length of ride x training zone number. For example, a 30 minute ride in Zone 4 would be assumed to produce the same amount of fatigue as a 60 minute ride in Zone 2. I have previously blogged a critique of this way of estimating training load and today's post is background for a followup to that critique. One reason coaches and athletes might be misled to use training zones in this way because the common metrics used to define these zones track each other pretty closely. In theory, coaches could set training zones to anything they like, but in fact most common training zones are set up so that each zone represents a similarly-sized range of heart rates and power levels and so the zones themselves seem to fit this same pattern. Thus, it is easy to believe that this pattern is more universal than it actually is. The question I will address in today's post is why do these metrics track each other so closely?

To attempt to explain why the standard metrics used to track training zones tend to be similarly spaced, I will be explaining in some detail the physiological meaning of these metrics. This explanation is quite long and fairly technical, I needed to make it that way to provide the foundation for future posts. However, if all you want is the bottom line, here it is:
The metrics used to delineate training zones are all related to power and energy, either the watts measured by a bicycle power meter or the calories and oxygen consumed by an athlete to generate those watts. That is why they track each other so closely. In my future post, I will argue that it does not necessarily follow that other outputs like training benefit or fatigue will do the same.

Power

The simplest of the intensity metrics to understand is Power. It is how much energy an athlete's muscles apply to the pedals each second. Uphill, downhill, windy, none of it matters, power is power. Of course, it might be much harder for one athlete to generate 300 watts of power than for another, so training zone boundaries measured by power will vary from athlete to athlete. One way coaches try to correct for that is to not measure power in watts, but rather relative to power at some defined level of intensity, for example, the power than can be maintained for 30 minutes, known as Functional Threshold Power (FTP.) Thus, many tables of training zones use %FTP to specify training zone boundaries.

Calories to Power

It is commonly known that the power an athlete's muscles apply to their bicycle's pedals comes from the food they eat. The amount of energy in food is measured in calories. Energy and power are not the same thing but they are closely related: power is the energy is used per unit time. Bicycle power meters measure power in units of watts. 100 watts of power is more or less what is needed to move a bicycle forward on a level road at about 15 miles per hour. If the conversion of food calories to watts was 100% efficient (which it is not) generating those 100 watts would require eating 86 calories per hour. In fact, the efficiency of the conversion of food energy to bicycle motion is more in the ballpark of 20% so you might expect that to ride a bicycle at 15 miles per hour would burn approximately 400 calories per hour. If you go onto one of the many websites that estimates calories burned while bicycling, you will find they vary a lot from site to site but that they give numbers which are in that ballpark^.

VO2, VCO2, and VO2max

How can the number of calories being consumed by exercise be measured? The calories in food are turned into energy the body can use by reaction with oxygen to generate carbon dioxide (CO2) and water (H2O). Oxygen consumed is much easier to measure than calories. The units used to measure oxygen consumption is VO2. VO2 is the volume of oxygen absorbed by the body, the amount breathed in minus the amount breathed out (not all the oxygen breathed in is absorbed.) The same equipment can often be used to measure CO2 at the same time, the volume of CO2 expelled, a metric named VCO2. It is useful to measure both VO2 and VCO2 because the ratio of the two depends on how much fat and how much carbohydrate the athlete is burning. As the intensity of a ride increases, more carbohydrate is burned and the ratio of fat to carbohydrate and thus the ratio of CO2 expelled and O2 absorbed go down. This ratio is one of the physiological states coaches can use to determine training zones. The ever popular metric, VO2max, is just the highest level of VO2 an athlete can achieve when they are cycling as hard as they can. Although it is possible to purchase a user grade device for measuring VO2, it costs as much as a high end bike so VO2 is usually measured in a gym or medical lab.

METs and %VO2max

METs are a unit that is used much more by the medical community than by the exercise community. Fundamentally, it measures exactly the same thing as VO2. One issue with both these measures is that a larger person will use more calories and oxygen than a smaller person. To correct for that, METs are almost always expressed per kilogram (kg) of body weight, and VO2 often is as well. This is far from a perfect correction, however. How much of the body weight is fat vs muscle matters and size has different impacts for different kinds and intensities of exercise. As a result, there is a lot of talk in the exercise community about reconsidering this correction, but as of today, dividing by body weight is usually what is done.

Another problem with both METs and VO2 is that they ignore levels of fitness. Riding at 15 mph might be very fast for a completely untrained person but very slow for a professional cyclist. Even though METs and VO2 are the same thing, the medical community, who are the main users of METs, almost always ignore this. The exercise community correct for this by not using VO2 itself, but by using the ratio of the VO2 at various intensities to VO2max, a number known as %VO2max.

A final practical difference between METs and VO2 is that the medical community is interested in even low levels of exercise whereas the exercise community is not, so the values reported for %VO2 max do not extend into these low intensities of exercise.

Another way to think about the low level of intensity problem is to ask, what is the level of intensity at the bottom of Zone 1? Theoretically, it is the cyclist sitting on the couch, watching TV. In practice, it is considerably higher than that. If that were not true, then Zone 1 would be much larger than the other training zones. In fact, if Zone 1 were to be made the same size as the other zones, there would be room for two more zones below Zone 1 (as I discuss below in Putting It All Together.)

Heart Rate

The oxygen the athlete breathes in is transported from their lungs to their muscles by their blood which is pumped by their heart. The amount of blood which is pumped by each beat of the heart can be increased by training, but on any given day, each beat of the heart pumps the same volume of blood, so that (in some cases, to some degree of accuracy) heart rate is determined by the oxygen consumed by an athlete's muscles and thus watts measured by their power meter.

There is enough truth in the above story to be very useful, but it is a simplification. Anyone who has ever used heart rate to determine training intensity zones knows that heart rate can vary enormously at a constant level of effort (intensity.) For example, emotional stress affects heart rate dramatically. Similarly, at constant level of power output, an athlete's heart rate will be higher on a hot day than on a more temperate one. Training has a huge impact on heart rate. One of the more useful statistics for monitoring training is the decrease in heart rate that occurs at a constant level of power output. Finally, there is the phenomenon of decoupling. On a long bike ride at a constant effort, heart rate will be steady at first, but as the ride goes on will start increasing. When used to define ride intensity, heart rate has to be used with caution and with awareness of all these factors, but despite that can be extremely useful. The point is, heart rate turns out to be just another way of looking at this same energy in/energy out system.

Baseline Metabolic Rate

An athlete's calorie consumption does not start at zero. Even when they are sleeping or sitting on the couch watching TV their heart is beating, they are breathing, and they are burning calories. The other extreme depends very much on the fitness and genetics of the specific athlete. An average untrained person can burn calories about 10 times as fast when they are working as hard as they can as they do when watching TV. The best professional athletes can do much better, burning calories 25 times faster when working at their maximum rate than when at rest. The point is, when comparing calories in and power out, it is necessary to subtract this baseline. To further complicate things, different metrics of metabolic rate (intensity) have different baselines. Heart rate, for example, has a relatively high baseline compared to the highest possible heart rate, maximum heart rate (HRmax). In fact, many coaches and exercise scientists have switched from talking about heart rate (HR) to talking about reserve heart rate (rHR) which is just the measured heart rate minus the resting heart rate. For marking the boundaries of training zones, this makes no difference, but for understanding the relationship between heart rate and power, rHR is the better metric.

Blood Lactate

There are three kinds of muscle fiber, type I, type IIa, and type IIx. Between them, they use three sources of energy to generate ATP and creatine phosphate: anaerobic metabolism of glucose, aerobic metabolism of glucose, and aerobic metabolism of fat. Aerobic means with oxygen. Aerobic metabolism is what I talked about above, the reaction between food and oxygen to generate energy. Anaerobic means without oxygen. Below, I will be describing ways muscles can use glucose to generate energy that don't require oxygen.

Aerobic metabolism of fat and glucose do not result in lactate production. However they are slow. They are important during low intensity cycling but cannot keep up with the energy demands during high intensity cycling. Anaerobic metabolism of glucose is fast and thus allow for high intensity cycling. One downside of that is that it makes inefficient use of glucose and thus accelerates depletion of glycogen in the muscle fibers, and this depletion is a major source of medium-term fatigue. Another important source of fatigue is that anaerobic metabolism of glucose generates lactate that ultimately builds up in the blood, a major source of short-term fatigue. Between them, glycogen depletion and lactate accumulation limit how long high intensity cycling can continue.

Generation of lactate by anaerobic glucose metabolism and its release into the blood is a normal part of metabolism and not just by muscles. Many blood cells generate all their energy this way. Even at rest, muscle cells generate and release some lactate. Thus, there is a resting baseline level of lactate in the blood of about 1 millimolar (mM.) During low intensity cycling, most energy comes from the aerobic metabolism of glucose and fat, only a baseline level of lactate is released into the blood, and the blood lactate levels remain at this resting level. As cycling becomes more intense, more and more glucose has to be metabolised anaerobically creating lactate that is released from the muscle into the blood. During this medium intensity exercise a steady state is reached where at any intensity level there is an associated level of lactate in the blood. This phase continues up to a blood concentration of approximately 4 mM. Past that, metabolism of lactate cannot keep up with its production and as a result, there is no steady state, blood lactate continues to go up even at a constant intensity level. However, the rate of increase becomes faster as the intensity of cycling increases. In this third phase of intensity, lactate will relentlessly increase, more or less quickly depending on the intensity of cycling, up to a maximum of approximately 20 mM at which point it becomes impossible to continue. Thus, it is clear there are three fundamentally different intensity zones in terms of lactate. In Zone 1, lactate remains at resting levels and is not an issue in terms of fatigue. In Zone 2, lactate levels increase but the body can manage that lactate. As far as I know, the consequence of these steady state but increased levels of lactate on fatigue is not known. Finally, there is Zone 3 where lactate production exceeds the body's ability to manage it ultimately limiting the ability to continue at that intensity. This is why advocates of Polarized Training like Dr. Stephen Seiler often use a three training zone system based on these lactate levels. To avoid confusion, I will refer to these as Lactate Zones. As a final point I would note that blood lactate levels, like many other things (heart rate for example) vary significantly from person to person. Thus, the absolute values I have given here are typical but by no means accurate for any given cyclist. However, the general patterns are almost universal: no increase, increased but steady, steadily increasing*.

Knowledgeable readers may be raising their eyebrows at this point. Advocates of a blood-lactate-based three training zone system do not talk about lactate levels in the way I have above. Rather, they talk about how specific levels of blood lactate are associated with specific levels of intensity. When they plot a graph of blood lactate concentration as a function of watts generated on a bicycle trainer, they see that it goes up very slowly at first and then at some point starts going up faster. This transition is called the first inflection point and is the boundary between Lactate Zone 1 and Lactate Zone 2. As intensity continues to increase, there is a second increase in the rate at which lactate increases with intensity. This is called the second inflection point and is the boundary between Lactate Zone 2 and Lactate Zone 3. There are two reasons for the difference between how I explain lactate levels and how most coaches and exercise scientists do. The first is that, as I have previously noted, biology is never black and white. I talked about how at low intensity, blood lactate does not increase. That was a simplification. Rather than generating no excess lactate at low intensity, lactate generation starts gradually rather than all at once as I described it. So at these low intensities, in Lactate Zone 1, there is still some increase in blood lactate as intensity increases. The second reason for the difference in how I express things compared to coaches and exercise scientists has to do with the way exercise scientists typically conduct their experiments. Virtually every experiment I have seen has involved an intensity ramp where intensity is increased every few minutes and during each intensity period, a blood sample is taken and lactate measured. In Lactate Zone 3, where I claim lactate is not at equilibrium but rather is relentlessly increasing what is really being measured is how much lactate has accumulated at a specific intensity after the specific number of minutes used in the experimental protocol. Though reported as a lactate concentration, that value is really a rate, how fast lactate is being accumulated at that intensity and thus how much lactate has accumulated during those few minutes. The rate of lactate accumulation times the time the cyclist has been at the intensity being investigated results in the level of lactate that is reported. If the time at each intensity in the experimental protocol were increased, the level of lactate measured would also increase, so the levels reported are dependent on the details of the experimental protocol.

In Lactate Zone 2, the body is able to clear all the lactate generated by the muscles as fast as it is generated. How does the body do that? There are two general mechanisms. The first is to burn it as fuel. The heart is particularly good at that. The second mechanism is to convert it back into glucose. This happens mostly in the liver and in the kidneys. That glucose is then returned to the blood to be used by muscles and other parts of the body. These same two processes also occur when exercising in Zone 3, but they cannot occur fast enough to keep up with lactate generation by the muscles so lactate levels increase over time.

Oxygen Debt and Afterburn

Anaerobic metabolism of glucose is immensely wasteful. Oxidation of one molecule of glucose yields 38 molecules of ATP. Anaerobic metabolism of glucose to generate lactate yields only 2 molecules of ATP. Worse, conversion of that lactate back into glucose by the kidneys or the liver requires 6 molecules of ATP for a net loss of 4 molecules of ATP for each molecule of glucose used anaerobically by the muscles. From where comes the energy to create that ATP? From the oxidation of fat by the liver and kidneys. When exercise is over, the body has to get rid of all the lactate left in the blood, and that takes oxygen. ATP and creatine phosphate levels in the muscle are likely depleted, and oxidation of fat or glucose must occur after the end of exercise to replenish those. (These are just two examples, there are probably others.) As a result, oxygen use remains elevated after exercise has finished, a process that coaches and athletes refer to as "afterburn." When riding in Lactate Zone 1 or Lactate Zone 2, the oxygen used to burn carbs and fat is delivered more or less as it is being used. When riding in Lactate Zone 3, however, oxygen consumption lags behind energy generation. The athlete's body accumulates an oxygen "debt" that is "paid back" during the afterburn period, a period that can last for hours.

Relative Perceived Exertion

From a theoretical perspective, relative perceived exertion could be related to almost anything. It is a highly subjective, complex metric. All we know about it is that it is produced by subconscious processes in the brain from unknown data and using unknown algorithms. It is interesting, then, that when measured, the relative perceived exertion reported by most athletes is is fairly similar to heart rate and power. Why the subconscious brain of an athlete makes this association between the rate at which calories are being burned and how hard the effort that is burning those calories feels is a mystery of evolution. I won't be discussing it in this post but training effect, short and long term fatigue, and health benefits, things that our conscious brain think are much more important than calories, are not closely related to calories burned. Thus, it is a good thing that the way training zones are usually used does not require them to be proportional to fitness, health, or fatigue, because (as I have discussed in the past and will be discussing again in the future) they are not.

Putting It All Together

In the above graph, I have replotted the training zone data from the chart at the top of the post. The value plotted for each zone is the value at the upper boundary of that zone. Thus, Zone "0" is really the lower bound of Zone 1. As noted above, Coach Hughes does not really establish a lower boundary for Zone 1. In theory, Zone 1 extends all the way down to watching TV while sitting on the couch. For purposes of the graph above, what I did was to set a lower boundary for Zone 1 such that it would have about the same width as the other zones. When I did that, I found that I needed two more zones to get to the couch, which are labelled "light" for light exercise and "rest" for resting, or no exercise at all. I adjusted Coach Hughes heart rate data to use relative heart rate (rHR) rather than absolute heart rate. When I did that, I was astonished at how similar the graphs for heart rate and power were.

If you recall the sections above on Blood Lactate and Oxygen Debt and Afterburn it might seem surprising that rHR and power continue to follow each other all the way to the top of Zone 6. I argued that, rHR increases as demand for oxygen increases, and that, in Zones 5 and 6, muscles are using glucose to generate power in the absence of oxygen, so why does rHR continue to rise? I have two guesses, both of which I believe to be true. First, the heart pumps blood for reasons besides just supplying oxygen to an athlete's leg muscles. Some of those reasons might be to move lactate from the muscle to other parts of the body and to provide oxygen to organs like the liver which are generating energy by oxidizing fat to be used to convert lactate back to glucose. The second reason why rHR might parallel power through the top of Zone 6 is that Coach Hughes' Training Zone system has 7 zones, but does not give rHR data above Zone 5 nor power data above Zone 6. (For purposes of the graph, I estimated the rHR data for Zone 6.) It is only in the highest zone, Zone 7 (which Hughes designates not as Zone 7 but as "Sprint") that anaerobic power production really takes off. Hughes defines the top of Zone 6 by a power output of 120% of FTP. Race data shows athletes both professional and amateur reaching peak power outputs of 400% to 500% FTP (an output they can only maintain for 5 seconds or so.) There is no reason for Hughes to delineate the upper boundary of his "Sprint" Zone (Zone 7) because all the athlete needs to know is to go "all out", but that said, the absence of that boundary prevents us from seeing if the large predicted deviation of rHR and power in fact occurs even though common sense tells us that it must. Although there is significant athlete to athlete variation, for virtually all athletes, HRmax, the highest heart rate it is possible to reach, is between 120% and 135% the HR at lactate threshold, far lower than the 400% to 500% power that can be generated compared to the similar FTP.

Of the three commonly used zone metrics plotted, clearly Relative Perceived Exertion (RPE) seems to be the most different. This is despite the fact that, to make it more comparable to the other metrics, I adjusted the values of RPE to be the percentage of the RPE at the top of Zone 4. (It is not at all clear if, given what RPE is, doing so makes any sense.) Given how opaque and subjective RPE is, what is interesting is not that it is somewhat different than power and heart rate, but how similar it is. That said, it is relatively easy to rationalize away even that difference. Because Coach Hughes is a coach, the RPE values he suggests are for a trained athlete. Such an athlete would feel no exertion whatsoever during "light" exercise. An untrained person might have a different experience resulting in different values for RPE at the same heart rate and (relative) power output. I will leave it as an exercise for the reader to imagine how adjusting RPE for such an untrained person would cause them to approach those for rHR and Power.

In Conclusion

The goal of training plans developed by coaches is to increase the various aspects of fitness of their athletes without generating more fatigue than their athletes can handle. To accomplish that goal, the training plans specify different amounts (minutes) of exercise at specific intensities. Calorie consumption is a secondary consideration, if it is a consideration at all. As it happens, the tools available for measuring intensity all measure calorie consumption or the energy output fueled by those calories. This is not a problem as long as it is recognized that what is being measured (heart rate, power output, ...) is not a measure of fitness or fatigue and that the relationship between intensity defined by energy and power will be different than intensity defined by fitness and fatigue.

^ 360 cal/hr: https://www.welovecycling.com/wide/2020/05/14/how-to-convert-watts-into-calories-burned-on-the-bike/

568 cal/hr: https://captaincalculator.com/health/calorie/calories-burned-cycling-calculator/

540 cal/hr: https://gearandgrit.com/convert-watts-calories-burned-cycling/

360 cal/hr: https://mccraw.co.uk/2012/10/14/powertap-meter-convert-watts-calories-burned/

* (This paragraph is based on "Many Factors to Consider When Collecting, Analyzing, and Interpreting Blood Lactate Measurements" in "Physiological Tests for Elite Athletes" Second Edition by Australian Institute of Sport (Author), Rebecca Tanner (Editor), Christopher Gore (Editor) ISBN-13: 978-0736097116, ISBN-10: 0736097112)