Better Programming

Advice for programmers.

Follow publication

Individualizing My Whoop 4.0 Data To Better My Athletic Performance

Cinoon Bak
Better Programming
Published in
11 min readFeb 26, 2023

Photo by Baylee Griffin

Despite what my middle school and high school coaches used to say about training harder than everyone else, scientific evidence shows that this approach may not be the most effective for athletes. I learned this lesson as a former college baseball player throughout my career. Pushing myself to train at maximum capacity every single day did not necessarily yield the best results. Overtraining is a real thing; there may be something to “working smarter, not harder.”

In recent years, wearable fitness trackers have become increasingly popular. After considering various options, I chose Whoop, for its detailed daily reports, which are tailored to individuals seeking to optimize their training and performance. In addition to counting calories and steps, the device serves as a wrist-mounted coach that provides feedback on training, sleep, and overall health. More detailed features can be found on the website.

How often did I wear it? I wore it 24/7. When I say 24/7, I mean I never took it off, not even in the shower. The device has a clip-on charger that allows the Whoop to remain on your wrist while charging. The photo above shows me wearing it during baseball games, as well as during practice sessions and workouts.

If you’re interested in reviewing the completed code and visualizations for this analysis, you can find them on my GitHub repository.

Data Collection

As mentioned above, I continuously wore Whoop on my body for exactly one year. During this time, Whoop collected various types of data, including day strain, calorie expenditure, sleep duration and performance, and various heart rate performances. I could access this daily performance data through the Whoop app on my phone.

Whoop website App and Feature

One of the great features of Whoop 4.0 was that the device automatically detected which activities you were engaged in and recorded them, including the start time of each activity. Similarly, it automatically registers your sleep patterns by detecting when you got into bed, fell asleep, and woke up.

Data Concerns and Caveats

Since wearable fitness trackers are not classified as medical devices, their data accuracy may be limited. I observed several errors over the course of the year, such as incorrect activity tracking or inaccuracies in the start time of an activity. Similarly, the device sometimes failed to record my sleep duration accurately. On occasion, it registered two separate sleep times if I woke up in the middle of the night to use the restroom, which caused my recovery scores to be skewed. However, I could correct most of these issues through the app on my phone once I noticed the errors.

Data Processing

Whoop makes it incredibly simple for users to export their personal data. All it takes is a quick tap of a button on the mobile app, and within moments, users will receive an email from Whoop containing three separate CSV files.

Down below is the initial call for exporting data. More instructions can be found here.

Whoop website app and features

Whoop offers three distinct CSV files to users, containing data related to sleep, workouts, and physiological metrics. In my experience, the physiological data was the most useful for my data analysis, as it provided a well-rounded view of both my sleep and workout data.

Data Cleaning

After importing the CSV file into my Jupyter notebook, I noticed it contained 26 columns, some of which were not relevant to my analysis. To streamline the data, I deleted six unnecessary columns and created four new columns that were more useful. Specifically, I generated columns to indicate the days of the week and associated colors based on my recovery score. For the purpose of this analysis, I also created two Boolean columns to reflect how my body felt in relation to the recovery score.

Over the year, I noticed that if my recovery score was just over 40%, I felt fine, and my day-to-day performance was not affected. Conversely, when my recovery score was under 40%, my body felt tired, and I was not able to train at my best. To capture this information, I created a “RecoveryFeeling” column in my analysis. 0 represents under 40%, while 1 represents 40% or greater. However, when I achieved a recovery score over 90%, I was at my absolute best, and almost all the time, I was able to push myself to my hardest training sessions. To capture this information, I created a “RecoveryBestFeeling” column in my analysis. In this column 0 represents under 90%, while 1 represents 90% or greater.

The code below was used for the two Boolean columns created.

Code for creating two boolean data type columns

Exploratory Data Analysis: Four things I found through EDA

1. More cardio = more strain on my body

One important feature of the Whoop tracker is its ability to calculate more than just the calories burned for each activity. It also provides a metric called “strain,” which summarizes the cardiovascular load on the body during the workout based on the heart rate. In addition, there is a metric called “calorie,” which summarizes the number of calories burned during the activity based on the basal metabolic rate (BMR) and heart rate. I have noticed that engaging in more cardio-intensive workouts results in higher strain on my body.

Visualization using Tableau Public (Image by Author)

Throwing a baseball may not appear to be a rigorous cardiovascular exercise; however, as a pitcher, repeatedly throwing a baseball as hard as possible involves both strength and cardiovascular exercise. The data demonstrates that when I play baseball I burn the greatest amount of calories, while also creating the most strain on my body. In contrast, the other three activities do not have the same correlation.

Weightlifting ranks second in calorie burn after baseball, but it appears to strain my body less than the other activities. Conversely, the last two activity categories– activity (mostly walking/hiking) and running, do not burn as many calories but exert greater strain on my body. Although I believe I am pushing myself harder during weightlifting, these results suggest that the cardiovascular nature of running, walking, and hiking places greater strain on my body than I realize.

2. Exercising for better/greater amounts of sleep did not apply to me

One of the most effective methods for people to improve their sleep quality is to exercise and tire their body. Doctors even recommend this method for those who struggle with sleeping. However, my personal data suggests that this may not be the case for me.

Visualization using Tableau Public (Image by Author)

The correlation between the two variables for the four graph are as follow:

  • Relationship between day strain and sleep time: 0.011
  • Relationship between day strain and sleep efficiency: 0.004
  • Relationship between calories burned and sleep time: 0.1127
  • Relationship between calories burned and sleep efficiency: -0.1060

These results indicate that burning more calories and straining my body more does not always significantly increase my sleep time or efficiency.

3. My heart rate variability (HRV) rises significantly when I play in baseball games

Heart rate variability (HRV) refers to the variation in time between the beats of one’s heart. A higher HRV indicates that the body is more capable of performing at a high level and handling stress. The Whoop website offers a comprehensive explanation of HRV.

Visualization using Tableau Public (Image by Author)

I underwent Tommy John Surgery in early February 2021 and began throwing against hitters in early February 2022. However, my velocity and command didn’t fully recover during the season, so I did not pitch in games from February to early May. In late May, I played summer ball until the end of June and then entered the off-season.

I did not face any hitters until late September and October. Interestingly, my two months of peak HRV coincided with the time I was throwing in games against hitters. I believe this is because I was extremely cautious with my daily training, recovery, and overall lifestyle. My focus was solely on my performance on the field, and my days were centered around baseball.

4. Tuesday and Thursday are my hardest training days

Whoop has helped me understand when to push myself during workouts and when to take it easy. I relied heavily on the recovery score/color to guide my training. If my recovery score was in the green or yellow, I would train hard, but if it was in the red, I would try to ease up.

Visualization using Tableau Public (Image by Author)

The data reveals that I have been training harder on the days when I had good recovery scores. On average, I burned the most calories during my green recovery days and the least on red recovery days. My average recovery score was 64% throughout the year, indicating that I was in the green zone. Interestingly, Whoop reports that the average recovery score for their members worldwide is 58%.

Visualization using Tableau Public (Image by Author)

The graph above indicates that my recovery score is higher on average on Monday, Tuesday, and Thursday. This corresponds directly with my previous findings, as Tuesday and Thursday are the days where I put the most strain on my body. Due to my higher recovery score on these days, I can push harder during my training sessions. Conversely, I typically do not strain my body as much on weekends, as I take most weekends off from training, regardless of my recovery score.

Visualization using Python Matplotlib (Image by Author)

Decision Tree

As mentioned during the data cleaning process, I created two columns based on how my body felt at two specific recovery score levels. Typically, when my recovery score was over 40%, I felt normal and could train at my usual level. However, throughout the year, I noticed that when my recovery score was over 90%, my body could push through and adapt to more challenging training sessions.

Why I chose to model a decision tree

The goal was to create a visual representation of a set of rules to show the path to each possible decision based on the two columns of boolean data type I created. This would help me understand which variables were factors in achieving a specific recovery score and what range of values these variables should be in. My ultimate aim was to educate myself on how to take action to improve my chances of achieving a high recovery score.

The code below shows the basic default decision tree I made. I used the same code for two trees but with different columns.

Code for Both Decision Trees

I split the data into training and testing sets, with 33% of the data allocated to testing. Initially, I used the default settings for the decision tree classifier, which included a Gini criterion and a maximum depth of 0. Additionally, I wanted to visualize which variables influenced determining whether I achieved a certain recovery score. I then prune the decision tree to obtain the best accuracy for the rules I was interested in.

Code for both Decision Tree Pruning and Visualization (Image by Author)

The code above shows how I pruned the tree and how I made a visual representation of the actual tree. The code for pruning the tree was helped from Gustavo Hideo's “Decision Tree: build, prune and visualize it using Python.”

Pruned tree for 40% recovery: criterion = entropy, max depth = 2

Accuracy for 40% recovery: 0.94

Pruned tree for 90% recovery: criterion = entropy, max depth = 3

Accuracy for 90% recovery: 0.87

Decision Tree Visualization for Recovery Score Over 40% (Image by Author)

The tree above indicates that a resting heart rate (RHR) of 56.5 or higher and a heart rate variability (HRV) of 53.5 or higher are likely to result in a recovery score of over 40%. Alternatively, even with a resting heart rate (RHR) below 56.5, a recovery score of over 40% is achievable by sleeping for at least 259.5 minutes (4 hours 25 minutes).

Decision Tree Visualization for Recovery Score Over 90% (Image by Author)

The tree above is more complex than the previous one. It suggests that to achieve a recovery score of over 90%, I need to have an HRV over or equal to 80.5, followed by light sleep over or equal to 190 minutes (3 hours 20 minutes), and burn less than 2654.5 calories. On the other hand, if I have an HRV over or equal to 80.5, but light sleep less than 190 minutes (3 hours 20 minutes), having a day strain less than 7.2 can still result in a recovery score of over 90%. However, if my HRV is less than 80.5, I am unlikely to achieve a recovery score over 90%, regardless of other factors.

Better Recovery = Better Performance

What can I do to improve my recovery?

Getting a recovery score of over 40% seems relatively achievable, with the main focus being on getting more than five hours of sleep per night, given that I’m not sick.

However, achieving a recovery score of over 90% is more challenging, as controlling factors such as HRV and light sleep during the sleep cycle is difficult. The three controllable factors for achieving a high recovery score are:

  1. If the next day is an important hard training day, I need to try to burn less than 2654.5 calories.
  2. Similarly, if the next day is an important hard training day, I need to try to have less than 7.2 day of strain.
  3. Lastly, as the graph shows below, I need to sleep more than 359 minutes (6 hours) to get at least 190 minutes of light sleep (3 and 10 minutes).
Visualization using Tableau Public (Image by Author)

While these changes do not guarantee a recovery score of over 90%, they can increase the likelihood of achieving that score. It’s important to recognize that there are some factors outside of our control when it comes to fitness and health, but by personalizing our approach, we can give ourselves the best possible chance to improve our performance.

I had a great time getting to know myself better through this analysis, and I am excited to continue discovering more about my fitness as a non-baseball player.

If you are interested in analyzing your own Whoop data, feel free to check out my Jupyter notebook on this GitHub repository.

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

Cinoon Bak
Cinoon Bak

Written by Cinoon Bak

Quant Associate @First Citizens Bank

Write a response