Run Dashboard

Posted on Sun 26 May 2024 in Projects

I spend the greater part of last year training for and running marathons. I finally ran a time fast enough to qualify for a World Marathon Major and plan to run the Chicago marathon next year! In the meantime, I am taking a small break from training to do some other hobbies and relax a bit after a tough training year. While I am not running as much, I thought a quick dashboard that summarizes my running over a period of time would be cool to make sure I am still maintaining a balanced running while not following a formalized training program. I don't want to be running all easy runs and want to mix in some long runs and speed workouts too! So I decided to work on a Tableau dashboard that gets data from my Strava account so I can quickly see how many miles I'm running, the week/monthly trends and the distribution of runs based on type of runs.

Here's a link to the Github repo that houses the code for the API.

There are certainly some improvements I can make and some tinkering I can do, so this project should eat up a good chunk of "bored" time I have. I am hoping to configure the linter to work on this project that executes in the Github Actions workflow during the test action. I am also working on the Kubernetes configs and hoping to have a test and live environment managed through helm charts.

API Overview

The data flow starts when I record a run in Strava. This is usually automatically done when I use my Garmin Forerunner watch to time a run and syncs the results and stats from Garmin to Strava. I will then go into Strava and make the updates to the run that was posted, changing the title of the run (if I'm following a program, I'll change the name to what the workout was, such as long run, threshold run or recovery run), and add the shoes used for the run. I collect shoe data for each run as I want to start tracking mileage in the dashboard in a future iteration. For now shoe data is collected but not really used in the dashboard.

After the run is updated, Strava will send a webhook to my API. I registered a Route 53 domain and hosted zone to point to my kubernetes cluster ingress which I've programmed separately. My kubernetes cluster is hosted on Digital Ocean as it is a pretty cheap way to maintain a cluster. In the repo read-me, I have a to-do task to include kubernetes helm and config files so I can start automating the configuration kubernetes. Strava webhook will send the following payload:

{
    aspect_type: 'update',
    event_time: 0000000000,
    object_id: 00000000000,
    object_type: 'activity',
    owner_id: 000000000,
    subscription_id: 000000,
    updates: { title: 'My Run' }
}

From there, the API will take the object_id of the payload and send a request to strava to get the information related to the request. The API first retrieves the Oauth token and determines if it needs to refresh the token based on expiration time and will send a request to get the data from Strava. From there, the API will take the relevant data fields and store them in the database. I am using a postgres database hosted on a droplet in Digital Ocean that can be accessed from the cluster.

The API will additionally send a request to get the shoe information for the activity that was recorded and send a get request. The API stores the shoe information in a separate table. The shoe data is currently not used and displayed on any dashboard but I am hoping in the future to do some other projects with that data.

Configuring Strava

I was able to setup webhooks using this link from Strava. One thing to note is when creating a subscription to your strava app, it defaults to a read scope but you need read_all to get webhooks properly setup. I initially had some issues where the confirmation webhook in the instructions wouldn't send and it was due to the limited scope I set.

The initial webhook setup I have turned off as I am only recording activities for myself and not registering any other owners. I have the code to validate the webhook commented out for now in the app.js file.

Data Prep

Postgres Database

Like I mentioned above, Postgres is used as the database. I ran the following SQL commands to setup the schema and tables necessary (in addition to creating users and assigning privileges):

Click to reveal the SQL Queries
CREATE schema api_data;

CREATE TABLE api_data.webhooks (
    aspect_type text, 
    event_time bigint, 
    object_id bigint, 
    object_type text, 
    owner_id int, 
    subscription_id int, 
    updates text
);

CREATE TABLE api_data.oauth_tokens (
    owner_id INT,
    token_type TEXT,
    access_token TEXT,
    expires_at BIGINT,
    expires_in INT,
    refresh_token TEXT
);
alter table api_data.oauth_tokens
ADD CONSTRAINT owner_id_unq UNIQUE(owner_id);

CREATE TABLE api_data.activities (
    object_id BIGINT,
    owner_id INT,
    name TEXT,
    distance INT,
    total_elevation_gain INT,
    type TEXT,
    average_cadence NUMERIC(4, 1),
    average_watts NUMERIC(4, 1),
    average_heartrate NUMERIC(4, 1),
    start_time TIME;
    moving_time INT,
    elapsed_time INT,
    sport_type TEXT,
    gear_id TEXT
);

alter table api_data.activities
ADD CONSTRAINT object_id_unq UNIQUE(object_id);

CREATE TABLE api_data.gear (
    gear_id TEXT,
    owner_id INT,
    "primary" boolean,
    resources_state INT,
    distance INT,
    brand_name TEXT,
    model_name TEXT,
    frame_type INT,
    description TEXT
);

alter table api_data.gear
ADD CONSTRAINT gear_owner_unq UNIQUE(gear_id, owner_id);

Tableau

The last step after configuring the API and the database is to get the data into Tableau. I'm using Tableau public and only offers a limited set of connectors to data. I played around with delivering the data from the API in an XML Odata format, but even Tableau does not use the most recent OData features. I decided to go with Google sheet that uses the IMPORTXML function to import the data from an exposed get method from the API. The function sends the request to my API and will translate the XML response to a table format in google sheet. I configured the google sheet to update every day and Tableau Public refreshes the data every day from the Google Sheet around 8:20. This works in my use case as I only really do 1 run a day, so no need for more refreshes.

CICD

The last thing I've worked on is a CI/CD pipeline in Github actions for this project. With a pull request to the main branch, a few workloads will start. A CodeQL scan will start for any static code vulnerabilities. A linter will run using eslint but currently is not configured to run and is a to do on my end! After the linter completes, API tests will run using jest and supertest. The API tests are configured to run an self-hosted runner that I have so that it can perform database operations on a test DB and I don't have to expose the database to the internet for Github hosted runners. If the tests pass, the workflow will build and push a test image and I have a test deployment in my kubernetes cluster for further testing should I need it. After successful merge to main, that's when the main deploy workflow will trigger and build and push the image to latest and will update the live cluster that will feed into the Tableau dashboard.

That's it for now! This has been a good project after my marathon and hoping to use it to learn more about kubernetes and improving the CICD pipeline for this app. I'm hoping to sit for the Certified Kubernetes Administrator exam so a project like this helps with practice!