Saturday, May 30, 2020

I Am Not Average

In case it is not obvious, this is not a real ad. Photo from U.S. News and World Report.


[Part 3 of 3 in a series on whether I should give sprint workouts another try.]

The Problem


Since before my first blog post back in 2012, I have been trying to decide if sprint workouts, AKA brisk or high intensity workouts, would be beneficial for me. On the one hand, when I have introduced sprint workouts into my routine, I have not noticed a lot of benefit. Rather, I find that they frequently drive me into overtraining. On the other hand, everything I have ever read about training highly recommends them.

So what have I been reading? The first thing I started reading were training books written by cycling coaches. I have had a bipartite cycling career, the first part running from 1965 or so and continuing full force through 1970 and then slowly petering out until 1978 when it stopped completely. The second part started in 2008, stopped in 2009, started again in 2010, and has continued with ups and down but no significant breaks into the present. Even back in 1965-1970 when I was racing, I did not have a particularly organized training plan, I just rode from challenge to challenge. When I restarted cycling in 2008, I took the same approach. However, when I tried to train for a 200K (124 mile) brevet (challenge ride), I found it more difficult than expected, and my wife bought me "The Complete Book of Long-Distance Cycling" by Burke and Pavelka, where I first encountered the concept of an organized training plan. Following the plan in that book I successfully prepared for a 200K brevet in the spring of 2012. However, I quickly ran into problems and developed questions which lead me to read a variety of such books. I have found these books extremely helpful as a source of ideas, but my experience has been that some of those ideas seemed to work for me while others didn't. In retrospect, I think this was due to a fundamental limitations of such books. It seems to me that central to the concept of coaching is the interaction between coach and trainee. The coach tries workout ideas, sees how their trainee responds, and adjusts accordingly. This central interplay is, of course, missing from the book experience.

As I continued to explore the world of training advice, I encountered scientific studies that compared the virtues of different training plans. Coaching is based on intuition and experience. These studies are based on science, and as a retired scientist, I found that most enticing. There is a whole ecosystem around such studies: they are reported with varying degrees of inaccuracy by the popular press, they are summarized by scientific, medical, and government entities into guidelines and are one source of ideas that coaches use to develop suggestions for their trainees and to put into their books and articles and blog posts, all of which I read compulsively. In all cases, I try to go back to the original scientific publications and read them critically but with an open mind. To date, I have reviewed about ten different scientific studies on exercise on this blog. In part 1 of this series, I reviewed a scientific study that examined the benefits of sprint workouts for health in the elderly (e.g. me.) In part 2 of this series, I reviewed a scientific study that looked for correlations between training intensity and improvement in cycling performance. In the case of almost all of these studies, I seem to find something in them that makes me question their conclusions. Issues I have commented on to date include:

1. There are too few subjects in a study so the results are not statistically significant.
 
2. There are problems with the study design, things like changing more than one variable at a time, that make the results difficult to interpret.
 
3. The study is observational rather than experimental and as a result, cause and effect cannot be proven.
 
4. The study is a biomarker study and as a result, it is not clear that what I care about (e.g. health) is improved just because the biomarker (e.g. VO2peak) is improved.
 
5. The subjects are very different from me (younger, more athletic) making it unclear if the conclusions of the study apply to me.
I would like to invest a few more words on Issue #5 and focus on one way in which the participants in most studies differ from me: they start with subjects who are not currently exercising. In contrast, I have been cycling more or less continuously for well over a decade. Why do so many studies start with sedentary subjects? I have yet to read an explanation but I have my guesses. One guess is that someone who is not exercising and then starts will exhibit a large increase in fitness. Large effects are easier to study, making this choice attractive to the scientists conducting the study. A second guess is that being sedentary is a fairly uniform state, people who are not exercising at all are relatively similar one to another. If you did a study on people who are already exercising, it is likely that they will have different exercise schedules and thus will be starting from different places relative to their maximum fitness. Thus, the same exercise program would be a step up in difficulty for some, and a step down for others. Is this really a problem? Would we not expect that the exercise program that best helps someone get into shape would be the same that would help someone stay in shape? Maybe, but given the popularity of periodized training, I suspect most coaches would not agree. The training that is best at the beginning of the season when you first restart training is very different than the training that is best at the peak of the season, and I would expect that the training approach which is best when you first start cycling is different than what is best after you have been riding for a few years.

Finally, I would like to introduce one more related issue with almost every scientific study ever done on exercise:

6. Results from such studies are the values averaged across a number of subjects.
 
The problem with that that nobody is average, everyone is unique. I have posted a lot on this blog about individual variation. If everyone is different, does it make sense to do studies at all? If everyone is different, doesn't that mean that everyone just going to have to figure out the best exercise schedule for themselves? I actually don't think so, not from scratch. Although no two people are exactly alike, we do have a lot in common so that, though not perfect, studies comparing exercise protocols which report a average across multiple participants are way better than no study at all and I appreciate having such studies very much. Sure, I have to test their conclusions for myself, but knowing the average response to a particular training routine helps me know where to start. That said, I think it is possible to do such studies in a way that would make them even more useful. 

Aren't statistics necessary for helping to determine if the results of a study are real or are due to random chance? Yes they are, but there are different ways of applying statistics and how statistics are used needs to match what is trying to be accomplished. In the context of training for cycling, there are two sources of variability in performance. The first is day to day variability. For a variety of reasons, people have good days and bad days. Usually, that is not very interesting and it gets in the way of comparing different training programs. Imagine I want to determine if a polarized plan or a moderate intensity plan is better to help me prepare for a metric century. I do the polarized plan, ride a metric century, but as luck would have it, I have a bad day that day, so my speed is slow. For the next ride, I prepare using moderate intensity training. That day, I have a really good day, so my speed is fast. I conclude that moderate training is better for me, but this is a flawed conclusion, the random noise introduced by good days and bad days has obscured the true result. I need some way to average out those good days and bad days. The way most studies do this is by averaging the results of several different riders, some of whom are having good days and others having bad days. The problem with that approach is that there is a second source of variability, and that is person to person variability. Let me explain with one example. There is a great deal of anecdotal evidence that as one ages, one needs more recovery between hard rides. Imagine a study to determine the optimal number of rides per week; 3, 4, 5, or 6. Imagine the study groups contained a mixture of riders of different ages. If one looked at the older and the younger riders separately, one might find different optima, maybe 4 days a week for the older riders and 6 days a week for the younger, but if you average everyone together, one might find an average optimum of 5 days a week, optimal for neither group. One obvious way to deal with that particular problem is to subdivide the study groups into groups of similar riders; men vs. women, older vs. younger, serious vs. casual; those who have been riding a long time vs. beginners, and so forth. One problem with that approach is that studies become very large, they require lots of participants to cover all the different subgroups. Another problem which I consider to be even more serious is that I believe there are subgroups of riders who will give very different results in a study who cannot be easily identified, people who look the same but who differ genetically in ways that affect their response to exercise. The solution to that problem is described in the Wikipedia article on N of 1 trials. Rather than average the results of several subjects, one averages the results of several tests on the same subject. In principle, this allows a study to reduce the noise generated by good days and bad days but retain the information on person to person variability. This approach is not perfect either. In the first place, each subject is accumulating training, building up fatigue, and aging as the study progresses. In the second place, it requires very long studies to provide the time needed to test different exercise protocols on each of the subjects in a study. Ideally, a mixture of N of 1 protocols and more conventional protocols on well defined subgroups would complement each other, providing more information than either would alone. However, this only aggravates the problem of needing very large numbers of subjects and long study times.

A Proposed Solution


At long last that brings me to the picture at the top of this post. As a senior, I get a benefit from Medicare and my supplemental insurance plan called "silver sneakers", a free gym membership. Neither Medicare nor my insurance company provide this out of the goodness of their hearts, they provide it because if I exercise, I will be healthier and and as a result they will save money on my medical care. Might that be true for younger people as well? Might it save insurance companies money to encourage exercise by paying for gym membership even for younger customers? Medicare, though it does not cover these younger people, might decide that on top of whatever immediate improvement in health exercise provides to the young, exercise now will make them healthier later when they reach their 60s and begin to be covered by Medicare, and thus save Medicare money in the long run. For the purposes of this post, let's assume that one or both of these is true, and that as a result, a significant number of people become eligible for subsidized gym membership. A requirement of such a subsidy might be that participants agree that in return, they will participate in studies comparing different exercise protocols. This could be a small ask, such studies could be designed to have minimal impact on a participants training plans.

Full disclosure, I have not taken advantage of my Silver Sneakers benefit because my preferred exercise is cycling and the gyms that currently participate in Silver Sneakers don't particularly support cycling. What I have considered is working with Five Rings Cycling Center, an organization that provides coaching for a wide range of cyclists from serious racing cyclists to casual cyclists like me. If my plan to improve the usefulness of scientific studies on cycling were to happen, besides increasing the number of participants, Silver Sneakers would have to increase the range of providers to include groups like Five Rings. What would be the requirements for an organization to participate? First, that they provide a program that the medical community agrees improves health. I would expect that most coaching organizations would easily meet this requirement. Second, that they participate in the scientific study part of the program. A requirement for coaches employed as part of this plan is that they abide by government guidance in designing the plans for their clients. Is any of this at all likely? That is an interesting question, but one well beyond the scope of this post. The thought experiment which is the subject of this post is to imagine, if some of the current constraints were relaxed, how might scientific studies on the benefits of different kinds of exercise be improved. For the purposes of answering that question, assume that the expansion of Silver Sneakers proposed here happened. 

How would that impact scientific research on exercise in general and cycling in particular? Specifically, how would the above plan solve the six common problems with research studies outlined at the top of this post?
  1. "There are too few subjects in a study so the results are not statistically significant." The number of potential subjects available to studies would be dramatically increased. My guess is that this problem would become a thing of the past.
  2. "There are problems with the study design, things like changing more than one variable at a time, that make the results difficult to interpret." At first glance it might appear that this plan might not help with this problem, but in an organized system like the above there would be more opportunities for investigators to interact with each other which should improve the quality of studies and reduce problems with experimental design.
  3. "The study is observational rather than experimental and as a result, cause and effect cannot be proven." Coaches would ask people to do one or another plan. Those that were unwilling would not be included in the study.
  4. "The study is a biomarker study and as a result, it is not clear that health is improved just because a biomarker is." Because the studies would go on for a long time, actual health data could be obtained.
  5. "The subjects are very different from those wanting to use the results of the study ...  studies start with sedentary subjects." With lots of subjects available, many more subgroups similar to many more users will be available.
  6. "The results of almost all studies are reported as an average of a number of subjects." Because subjects are in long term, N of 1 protocols become possible.
Given that the proposed changes in Silver Sneakers has not happened, does this thought experiment have any value? I believe that it does. By clearly imagining what a more ideal study would look like, I feel I am better able to evaluate the studies that actually exist.