A Quest for Better Sleep with Fitbit Data Analysis

First thing first: the data.
Kind of a pointless quest if I’m not able to get my hands on my Fitbit data; like the armor for a naked hero going to battle, I would not even stand up from the couch without it, and I actually feared to be in such condition. My first impression while browsing my dashboard on the Fitbit website was like: “sh*t, I need a premium account to get all my data. Damn you Fitbit!”.

At the moment I am writing the situation is not so catastrophic, but not all sunshine and puppy dogs either. It’s possible to export some data for free (a maximum of 31 days worth of data at a time) but this is the kind of data I find pretty much pointless: is simply some daily aggregated stats like total minutes of sleep, restless count etc.; what I am interested in — and what you should be interested in too — is what is called intraday data, meaning the minute to minute report that a Fitbit device generates when active.

So here we are, key point being Fitbit is not generous enough; or better, not fair enough, because that’s my data, and strikes as ridiculous that I cannot get it all and without much fuss (e.g. creating an ad-hoc app). Though this being a more than important issue, I will not discuss it further, but you can find many related articles online, as well as many other people simply complaining about this.

If from one side Fitbit started to personify the final boss of my quest, on the other the open source community of programmers was teaming up with me, unbelievably generous and ready to provide support for free.
You can find many projects — more or less recent and maintained — that can help you scrape all the data you are interested in. I personally relied on Andrew solution: written in Python, collects data using the Fitbit website graphs endpoints and includes an automatic script to get all the data in one go. Simply provide email and password to the script in order to authenticate and retrieve the credentials for the calls. (I gave a look at the code, and seemed to me all legitimate)

If you have found a better/quicker solution, or if you know about Fitbit opening up a bit, please let me know and feel free to comment here.

The Analysis

The data from an intraday sleep file looks like this:

2016–03–13 22:46:00,3
2016–03–13 22:47:00,3
2016–03–13 22:48:00,2
2016–03–13 22:49:00,2
2016–03–13 22:50:00,1

2016–03–14 07:03:00,1
2016–03–14 07:04:00,1
2016–03–14 07:05:00,1

With two lines of Python code we can now obtain the same data that Fitbit so generously allows us to download from its website, and so much more.
Remember that the values are mapped like this: 0= none (no measure taken),
1= sleeping, 2= restless, 3= awake.

The first diagram I plotted is for getting an overall grasp of the most important measures. It shows the distribution of each one using a simple histogram.
Sleep efficiency is the percentage ratio between sleeping minutes and total minutes in bed (inefficiency is then simply the complementary percentage).

Histograms showing measures distribution, using a default 10 bins.

Is weird to generalize actual exact data, but some generic observations can give a more intuitive feeling and immediate understanding, for example for this dataset we could observe that:

  • I am awake on average 3 times per night
  • Generally fall asleep in less than 10 minutes
  • 30 minutes restless per night, getting a sleep efficiency of about 90%
  • In bed between 7 to 9 hours, brought down to an average of 7.30 for actual sleep hours.

The next diagrams are for showing how measures vary during different time periods. I personally believe that day-of-week and month are the most relevant ones for this case, and of course, if you have collected enough data, you can starts working by year to see how your sleep patterns are changing.

Point plot of different measures by day of week, showing the estimated mean and confidence intervals.

We can speculate a bit on this results, noticing that — differently than expected — sleep inefficiency tends to go down toward the weekend, and up and ready again on Monday —looks like working makes sleep worse. Moreover TGIF might be the cause of the lowest average in terms of minutes before falling asleep for the week.
It’s always tricky to get insight from such data, for example Thursday seems a bad day in terms of restless time, but most likely just because is the day that gets more overall time in bed (total minutes).

Boxplots for a sample of monthly sleep measures.

Monthly stats might help to spot variations determined by some new habits, or by some change in daily life. Was moving the bed so it faces north a good idea? Going to the gym two times per week? Reading before sleep? What about that increase in partying and related alcohol consumption? Notice that some changes — take the last example — are more likely to simply create outliers, than to affect the average results in a significant way, unless you are really going for it!

A day per day diagram is also good to possibly spot more general trends. How do you sleep now that you moved to your new house, to another country, or with someone else?

Barplot for daily stats. All the date-range has been considered, leaving empty where no measures have been taken (i.e. forgot to wear the Fitbit)

Finally I did some experiments using heatmaps, counting values for all records in order to spot some patterns. I was curious to check if there is a time during the night when I tend to wake up, or when I’m often restless. This can be approached in two ways: using the actual time as index (x axis), or using the minutes as index, meaning that all data is realigned based on the first minute you fall asleep, and for each successive minute we just apply an incremental of one. This can be used to spot that — for whatever reason — you always tend to wake up after 50 minutes you feel asleep, no matter the actual time you went to sleep — yes you are weird, but the better you know...

Heatmap by minutes for the “sleeping” measure. To generate this the count function has been applied for all recorded days, with a column for each time.

As a bonus here is the plot I generated to check for pairwise relationships plus the computed correlations (using Pearson correlation coefficient). The truth is that the results don’t add much to what I was already expecting: the more I stay in bed the more I am restless, and how quick I fall asleep does not influence any other variable in a relevant way.
We might find more interesting correlations considering measures other than sleep, like heart rate or daily steps count.


There are many more things that need to be analyzed though, and I will sure keep track of future data, to spot if something new is going on with my sleep, and act accordingly. Even more important is the fact that this was just the start of my quest, because yes, a secret I didn’t reveal you at the beginning is my real initial purpose: improving my lucid dreaming skills. So if you already know about this practice or if you want to learn more, stay tuned.

You can find all the code in my Github repository. As always I more than welcome all kinds of comments, critiques, suggestions, contributions and — obviously — corrections.

Data Scientist @ Zalando Dublin - Machine Learning, Computer Vision and Everything Generative ❤

Data Scientist @ Zalando Dublin - Machine Learning, Computer Vision and Everything Generative ❤