Monday, June 7, 2021

What Is Truth?




Ride speed on my New Alpine and New Alpine Cañada routes since my move to Emerald Hills. Ride speeds are in blue. The curve in red is a moving average of 8 rides centered around each time point.



A recurring theme on this blog is the impact of long term fatigue on my cycling. Back in 2012, 2013, and 2014, I was attempting to be a randonneur, a long distance cyclist who does rides of 200 to 1200 kilometers (km). During those years, I only managed two such rides, both at the shortest 200 km distance, and the biggest factor in my failure to complete more was long term fatigue. When I moved to California three and a half years ago, I decided to switch to shorter rides, 100 km rides known as metric centuries. My hope was that, unlike with the longer 200 km rides, I could do the 100 km rides more frequently, that as compared to the long term fatigue generated by training for and riding the 200 km rides, fatigue would be less of an issue with the 100 km rides. That has definitely been the case. However, less of an issue and not an issue are not the same thing. Although reduced, I believe that long term fatigue is still an issue for my 100 km rides. Or is it? Was fatigue ever the issue I thought it was? As a scientist, I always question my assumptions, there is a lot to question, and as much as I have discussed long term fatigue I still have a lot of uncertainty about it, an uncertainty that recently manifest itself yet again.

Let's start with some definitions: Form = Fitness - Fatigue. Form is what determines how fast I can potentially complete a ride, how long a ride I can complete, etc. If I haven't been training, my Fitness will be low and I will not be able to ride very quickly or very far. On the other hand, I might be very fit but completely exhausted from the training required to develop that Fitness, as a result my Fatigue is high, and so again, my Form and thus speed on a ride will be low. Also, factors other than cycling can cause Fatigue, other kinds of exercise (e.g. yard work), emotional stress, lack of sleep, illness, etc. Finally, this is not a quantitative equation. The biological mechanisms underlying Form, Fitness, and Fatigue are far from completely understood and there are no generally agreed-upon ways of measuring them and thus no units of Form, Fitness, or Fatigue. Thus, Form = Fitness - Fatigue is a conceptual equation. This post is about Form. To what extent my Form at any moment in time is due to Fitness or due to Fatigue has been and will be discussed elsewhere.

What inspired this post? As the COVID-19 pandemic has started to wane, I had hoped to restart my metric century group rides last month by riding the Art of Survival on May 29. However, I was not able to do that. I tried to replicate the very successful training plan I had developed for the 2019 Golden Hills metric century which includes a weekly 33 mile ride in the months before the event and then a 44 mile ride 4 weeks before and a 54 mile ride 2 weeks before. I had been successful at doing the 33 mile ride almost every week and completed the 44 mile ride on schedule, but there were warning signs. The two observations I have been able to use to measure my long term fatigue are how fast I ride and how I feel. By "how I feel" I mean do my legs feel sore and tired? Do I feel generally lethargic? Am I more grumpy than usual, do little things bother me more than they should? Am I unenthusiastic about starting a ride and do I find it an unpleasant slog once I go? As I attempted to train for the Art of Survival, that is how I felt. However, when I looked back at my subjective fatigue data, I found it unconvincing. I think I am just a pessimist, and it seems that my evaluation "how I felt" during the vast majority of my rides can be summarized as "I felt bad." Maybe I'm just old. Bill Clinton famously said that, after age 50, if you wake up in the morning and nothing hurts, you know you died during the night.

The second way I evaluate my readiness for a challenging ride is how fast I am riding. Currently, there are two routes I ride regularly enough that I can use them to assess my speed, named "New Alpine" and "New Alpine-Cañada". I felt like my speed on these rides were slow. The central question of this post is, is this feeling of slowness because the rides were really slow, or was it a subjective illusion?

The New Alpine and New Alpine-Cañada routes are similar to each other and similar to routes I rode back in the Fall of 2019 (routes named "Alpine" and "Alpine-Cañada"), a time when I was riding well. and thus I might be able to compare speeds between now and then. But is not the speed of a ride dependent on how fast I choose to ride as much as my fatigue level? To some extent, yes. (I have blogged about that.) That said, I firmly believe that the speed at which I complete these rides is governed much more by my Form than by a conscious decision; I tend to ride them at a relatively constant subjective feel, not holding back but at a speed I feel I could maintain indefinitely. Thus, when I notice that, over several rides, the speed of these rides is lower than average, I take note. Unfortunately, this is still a subjective assessment. Yes, the speed of a ride is objective but what I consider fast and what I consider slow is subjective as is how many slow rides it takes before I conclude I am suffering from fatigue. I wanted to use statistical analysis to make this assessment less subjective and more quantitative, and that is the subject of this post.

I think I have a pretty good understanding of the theory of statistics but I am not a statistician; I lack the years of experience that made the statisticians I have worked with over the years so valuable. Also, the ride data I want to analyze is quite different in structure than the experimental data for which standard statistical tools were developed. Statistics is always as much of an art as it is a science and even the best of statistical approaches will not give informative results if the underlying data is not what it is assumed to be. With all those pitfalls in mind, I have taken a very redundant, conservative approach to my analysis, an approach using different tools to cross check I wasn't making a silly mistake and which I kept as close to first principles as I could so as to avoid the oh so common errors resulting from plugging data into the wrong formula or algorithm. Finally, I would mention the importance of thinking carefully about exactly what question I am answering with any specific analysis.

I noted above that I thought I rode four of my routes at speeds that were, on average, the same. Here is the data on those four routes:



Back when I first started riding the Alpine and Alpine-Cañada routes, I predicted that, because the Alpine-Cañada route was longer it would, on average, be slower, but that has not been my experience. Subjectively, I noted that my average speed on the two routes tended to be similar. On the other hand, looking at the small changes in the routes caused by my move from San Carlos to Emerald Hills, I predicted that, in the long run, the average speeds on the new routes and the old would be very close to the same. However, what might obscure that similarity in the short term is that my Fatigue seems to have been high for a lot of the time since the move. That illustrates a general complication of the analysis I am trying to do. Even assuming that ride speed is a simple reflection of my Form, they will not be randomly distributed over time, fast rides and slow rides will cluster, something I needed to keep firmly in mind as I did my analysis.

I could try to assess my Form using just the data from one of the four above datasets but there would be significant advantages if I could use them interchangeably, and thus my first statistical task was an analysis of variance to determine if I am formally justified in doing so. To perform Analysis of Variance (ANOVA), I used the formulas provided in "Primer of Biostatistics, Sixth Edition" by Stanton A. Glantz. (In general, that book was my guide for most of the statistics in this post. Hereafter, I will refer to this book as "Primer of Biostatistics.") Rather than use the table of critical values in that book, I used the Google Sheets FDIST function to calculate the P-value that the rides from the four routes were from the same "group", e.g. have the same average speeds. The equations I used required that the four groups to be compared have the same number of samples. The Alpine-Cañada route I have ridden the fewest number of times, 29, so I had to take subsets of the other three so that I had four sets of 29 rides to compare. The normal way of doing that is to pick random samples. However, because of the data is clustered with respect to time, that is, the speed I ride on Monday tends to be correlated to the speed I ride on Wednesday much more than it is to the ride I did six months ago, I used matching instead. Since the set needing the most subsampling, the Alpine ride, is interleaved with the smallest set, the Alpine-Cañada ride, I picked subsamples from the Alpine ride one week before or after an Alpine-Cañada ride when possible, and when that didn't give enough subsamples, scattered the remainder as evenly over time as I could. In the case of the New Alpine and New Alpine-Cañada datasets, they were only slightly too large (30 and 33 samples) and I had reason to believe that the most recent rides were outliers. In addition, I figured there was some virtue in having the rides I selected as close in time to the rides on the other two routes as possible. For those reasons, I picked the oldest 29 rides for each of these two routes. 

When I then did an ANOVA analysis on these four sets of 29 rides I determined that there was an approximately 60% chance they were all the same and a 40% chance that at least one was different. If I had been trying to prove that one of these rides was different (the most common use for this kind of analysis) I would have been disappointed that I failed the P < 0.05 test. Since I am hoping for the opposite, I should be happy, right? Well, as happy as I can be. In my opinion, there is no way to "prove" that rides on all four routes have the same average speed, I can only say I failed to prove otherwise. What my analysis does say is that there is no good reason, based on this data, to separate rides on these four routes from each other and thus provides justification for me to treat my ride speed on any of these four routes the same.

Does common sense agree with the above analysis? Common sense says that my average speed on these four rides cannot be exactly the same; after all, they are all different routes and those differences are almost certain to affect average speed. The question I really care about is this: "How big is the impact of route selection on average speed compared to the impact of Form?" Fortunately, statistics has a way to estimate the answer to that question, calculating confidence intervals. 

Using the same four subset of rides as I used for ANOVA, there are six possible pairwise comparisons I could make; Alpine vs Alpine-Cañada, Alpine vs New Alpine, Alpine vs New Alpine-Cañada, Alpine-Cañada vs New Alpine-Cañada, Alpine-Cañada vs New Alpine, and New Alpine vs New Alpine-Cañada. However, it would be a mistake to blindly make all six comparisons. If comparisons are done with the usual P < 0.05 criterion, there is a one in twenty chance that any difference declared significant will, in fact, be due to chance. With two comparisons, because there are now two chances to get unlucky, there is an almost 10% chance at least one of the two will be positive and by the time all six comparisons were made, that 5% risk has risen to about 26%. There are ways of correcting for that, but these corrections reduces the sensitivity of the analysis and so the right thing to do is to make as few comparisons as is necessary to answer the question at hand. I decided to make only one comparison, New Alpine vs New Alpine-Cañada. The reason I picked that one is that I probably will never ride the Alpine and Alpine-Cañada routes again (because my rides are door to door and the location of my door has permanently changed with my move from San Carlos to Emerald Hills) and so going forward, what I really want to know is if I am justified in pooling my rides on those two different routes. The 29 rides I used from the set of rides taken on the New Alpine route have an average speed of 12.06 mph, compared to those taken on the longer New Alpine-Cañada route which have an average speed of 12.24 mph. The difference in those average speeds is 0.18 mph. When I revisit that a year from now when I have 50 or 60 rides on each route, is that difference likely to stay the same, get larger, or get smaller? How close is 12.06 mph to the true average speed I would see on the New Alpine route if I rode it many more times? Using the approach outlined in Chapter 6 of Primer of Biostatistics, I calculated that it is 95% certain that the real difference in average speed over these two routes is between -0.01 and +0.47 mph. That is, the New Alpine route may even be a bit faster than the New Alpine-Cañada route, but is not likely to be more than 0.47 mph slower. Without belaboring the point, this suggests that any difference in speed between these two routes is unlikely to confound my attempts to determine my current Form using a random mixture of rides on them. Going forward, I will use rides on all four routes interchangeably and refer to them as the Alpine-Like routes and rides on those routes as Alpine-Like rides.

There is one more approach I would like to introduce before asking the question that inspired this blog post. Given the assumption (tested above) that speeds on all four of the Alpine-related routes can be used interchangeably, I have a total of 230 rides ridden over more than 40 months. This number is so large that I am going to consider it a statistical universe which allows me to use a different kind of test to ask questions like "Have my slow speeds over the last two months been slower than expected by chance?" That test is the one sample T-test, which determines if a set of measurements matches a known value. For example, given the above analysis, I am now claiming that I know that my average speed on any mixture of the Alpine routes is 12.26 mph. Using this approach, I don't have to take a small subset of those rides to compare to a small number of recent rides, I can compare those recent rides to the mean determined from the entire dataset. Unfortunately, Primer of Biostatistics does not include this version of the t-test, so I found an online calculator to do that for me:
I then recalculated using tools provided by Google Sheets as outlined in the following website:
Good news, the two approaches gave the same answer. The only thing left to do is decide which of my recent rides I should compare to that known average.

Visualization is always a good place to start an analysis, and so finally the graph at the top of this post becomes relevant. It displays the speed I rode an Alpine-Like ride for all the rides since moving to my new home in Emerald Hills. The actual ride speeds are in blue. The line in red is a running average of 8 of those ride speeds centered on each of those data points. My hypothesis based on that graph is as follows: During September and October of 2020 my speeds were increasing due to improved Fitness, that during November and December my speeds were decreasing due to a buildup of Fatigue, and since then my speeds have remained low because I failed to alter my training to allow me to recover from that Fatigue. In this post I will not attempt to determine if it is actually a buildup of Fatigue that caused my rides to be slow rather than lack of Fitness, I will simply ask: Are my recent rides truly slow or did I just have a few slow rides due to chance? There are statistical approaches for analyzing the rate at which ride speeds are increasing or decreasing, but I will not attempt to develop those for this post. That being the case, the easiest (and most relevant) rides to test are those rides that I am claiming were ridden at a unchanging low speed due to Fatigue, the rides between the beginning of February of 2021 and the middle of May. There are 24 rides between those dates with an average speed of 11.88±.46 mph. Using the one sample T-test, the chance that this is the same as the 12.26 mph average speed of all 230 Alpine-Like rides is ~0.004%, virtually non-existent. My recent rides have truly been slower than average.

How much slower have my recent rides been? The Standard Deviation for the 24 recent rides is 0.46 from which I can calculate that the Standard Error of the Mean (SEM) of those rides is 0.093 and so I can be 95% sure my true average speed between those dates was between 11.8 and 12.0 mph. Although statistically different, that is not very different in magnitude from my all time average of 12.26 mph. Sure, my ride speeds were really slower, but were they enough slower to matter? That is not a question of statistics, it is a question of biology and exercise science.

The biological rather than statistical significance of my slow rides is not a question for statistician but for an exercise physiologist (scientist) or a coach. I am unaware of any guidance from any scientist or coach on this precise question, and besides, every athlete is different and I am very different than the athletes scientists and coaches usually discuss, so I will attempt to answer this question myself. Let me start by asking what was the actual event that finally decided me I should not attempt to ride the Art of Survival this year? It was that I did not complete the training plan I had developed to get ready for this ride. Specifically, I failed to complete the last, 55 mile long training ride. This is similar to the reasons I failed to complete 200K brevets back between 2012 and 2014, it was not that I attempted the brevet and gave up along the way, it was as I approached the end of my training plan, I did not complete the longest rides in those plans. I have previously discussed ad nauseum why that might have happened and won't repeat that here, I will just take it as a fact. Back then, I noted that relatively small changes in my MAF test rides seemed to predict success or failure in preparing for a brevet. What I have accomplished in this post is to develop a California replacement for the MAF test. It is not that a 0.4 mph decrease in the speed at which I would have ridden the Art of Survival would have been the difference between success and failure, that would have only made a 5 hour ride less than 5 minutes longer, a matter of no consequence. It was that this decrease in speed on my standard rides is an indicator of the level of my Form. Attempting a physically challenging ride with such poor Form would, in my opinion, have been unwise. So yes, I believe that the relatively small decrease in average speed I have been riding recently is important, not in its own right, but as an indicator of my overall wellbeing.

Am I guilty of attacking a gnat with a sledge hammer? Have I belabored an obvious point? I don't think so. I have been eyeballing ride speed as an indicator of Form since I restarted cycling back in 2008 even though I knew that I might be deceiving myself. My tracking got even shakier when I moved to California, lost access to the Rice Track and MAF tests, and stopped using a heart rate monitor. Coaches recommend riding a test ride every month or so to access Form. The problem with that is that it is a single ride. Both intuition ("it was just a bad day") and statistics council us on the folly of basing a conclusion on one of anything. Back in Houston, I liked that the many MAF tests I rode for training gave me a statistically robust indicator of Form and by combining four of my most common rides I can now replicate that to some extent here in California. And yet, will all that, I was still eyeballing. This post is one of my iceberg posts where only a tiny fraction of the effort I put into it shows above the surface. I recently posted how I had made a copy of my training log in a relational computer database. I could not have done the analysis for this post without that database. I had to relearn (and in some cases learn) the statistics I needed to do this analysis. I tried many different approaches to analyzing the data as my originally fuzzy thinking about the questions I was asking became clearer. And finally, in the process of asking one specific question about my recent rides, I assembled statistical tools that will make it easier to use objective statistical analysis in place of eyeballing going forward. I may never figure out why I felt fatigued in the runup to Art of Survival 2021 or know if skipping it was the right decision, but at least now, one piece of that puzzle is real and not imaginary. As I have said many times before, I blog because it is fun, I do not deceive myself that it is any substitute for time on the bike, and it never is. I have never forgone even a single ride to work on this blog. And yet, I take satisfaction in knowing I am one small step closer to understanding why my cycling doesn't always go the way I'd like it to.































2 comments:

  1. According to Dan O'Neil of Odd Bodkins fame, truth is a 5 to 4 decision in the Supreme Court.

    Looking at your graphic one thing struck me: the data is really noisy, so even an eight day moving average is pretty unstable. Often times researchers with noisy data start tossing out the 10% if the observations that are the most distant from some value of central tendency. If you did that the graph would show the best times in the fall after a full season of riding , followed by a winter decline when you ride less due to the weather and currently fairly stable riding speeds as you are picking up more rides with better weather.

    ReplyDelete
    Replies
    1. Zinger,

      Thanks for the comment! We don't really have an off-season here in California and as best I can tell, I don't ride less in Winter than Summer. I will be saying more about that in my next blog post. I have taken your suggestion about removing outliers very seriously and am looking into the best way to do that.

      Delete