Deploying MLFlow on Heroku (with Heroku Postgres, S3, and nginx for basic auth)

mlops
Author

Matt Kaye

Published

December 26, 2021

Disclaimer: I followed this guide to setting up MLFlow on Heroku initially. However, there were certain aspects of it that are either outdated or do not work, so this post remedies those issues.

MLFlow

MLFlow is an open source tool for the entire machine learning lifecycle. It lets you create and experiment with models, write notes and descriptions, track parameters, metrics, and artifacts, and deploy to production all through an easy-to-use API and an intuitive UI. You can make calls to a running MLFlow service through the R API, the Python API, the Java API, or via the command line (cURL) or your favorite language by interacting with the REST API.

Overview

MLFlow is surprisingly easy to set up and deploy to Heroku. There are a few steps to follow to get everything running: 1. Dockerize an MLFlow instance 2. Set up an artifact store 3. Set up a database 4. Secure the instance with basic auth

Prerequisites

This guide assumes you already have AWS (specifically S3) set up. It also assumes some knowledge of shell scripting, Docker, and Heroku.

Creating a Heroku App

First, let’s create a Heroku app from the CLI. For the purposes of this example, I’m going to call the app my-mlflow-example. You’ll need to choose your own app name.

heroku create my-mlflow-example

Next, let’s attach a Heroku Postgres instance.

heroku addons:create heroku-postgresql:hobby-dev --app my-mlflow-example

Creating this hobby-dev Heroku Postgres instance will also automatically set the DATABASE_URL environment variable in your app’s configuration.

Next, we’ll need some other environment variables to be set. You can set your config like this:

heroku config:set \
  S3_URI=s3://YOUR-S3-URI \
  AWS_SECRET_ACCESS_KEY=YOUR-SECRET \
  AWS_ACCESS_KEY_ID=YOUR-KEY \
  AWS_DEFAULT_REGION=YOUR-REGION \
  MLFLOW_TRACKING_USERNAME=YOUR-USERNAME \
  MLFLOW_TRACKING_PASSWORD=YOUR-PASSWORD \
  --app my-mlflow-example

You can repeat fewer lines of code by creating a .env file that looks like this:

S3_URI=s3://YOUR-S3-URI AWS_SECRET_ACCESS_KEY=YOUR-SECRET ...

and then using heroku config:set .env --app my-mlflow-example.

Great! At this point, your Heroku app should be all set up. Now all we need to do is create the Docker image that will run MLFlow.

Setup

First thing’s first, let’s create a folder called my-mlflow-example where we’ll store all the files we’ll need. I’ll do that with:

mkdir my-mlflow-example && cd my-mlflow-example

Next, let’s create a few files we’ll need:

touch Dockerfile run.sh requirements.txt nginx.conf_template

You’ll also want to make your run.sh script executable.

chmod +x run.sh

Great, now we’ve to all the files we’ll need to deploy our MLFlow instance!

The Dockerfile

The Dockerfile will contain all of the library installs and files you need to run your MLFlow instance. The Dockerfile you’ll need to write for MLFlow should look something like this:

FROM continuumio/miniconda3

## Copy files into the image
COPY run.sh run.sh
COPY requirements.txt requirements.txt
COPY nginx.conf_template /etc/nginx/sites-available/default/nginx.conf_template

## Install Postgres
RUN apt-get -y update && \
  apt-get -y upgrade && \
  apt-get install -y postgresql

## Install nginx and dependencies
RUN apt-get -y update && \
  apt-get install -y make vim \
  automake gcc g++ subversion \
  musl-dev nginx gettext apache2-utils

## Install pip and dependencies
RUN conda install -c anaconda pip && \
  pip install --upgrade pip && \
  pip install -r requirements.txt && \
  conda update -n base -c defaults conda && \
  conda env list && \
  pip freeze list

## Run your `run.sh` script on container boot
CMD ./run.sh

The requirements.txt File

Next, copy the following lines into your requirements.txt:

mlflow
psycopg2-binary
boto3

This file will tell pip which libraries to install in your docker image in the RUN pip install -r requirements.txt line above.

The nginx Template

Next, we’ll create a template for the nginx.conf file that we’ll eventually use in the container for basic auth. One important issue here: This file is a template because Heroku randomly assigns a port on dyno start, which means we can’t hard code any ports for nginx (since we don’t know them ahead of time). Instead of hard-coding, we specify a couple of placeholders: $HEROKU_PORT and $MLFLOW_PORT. We’ll replace these with the proper ports on container startup.

Your nginx.conf_template should look like this:

events {}
http {
  server {
        listen $HEROKU_PORT;

        access_log /var/log/nginx/reverse-access.log;
        error_log /var/log/nginx/reverse-error.log;

        location / {
            auth_basic "Restricted Content";
            auth_basic_user_file /etc/nginx/.htpasswd;

            proxy_pass                          http://127.0.0.1:$MLFLOW_PORT/;
            proxy_set_header Host               $host;
            proxy_set_header X-Real-IP          $remote_addr;
            proxy_set_header X-Forwarded-For    $proxy_add_x_forwarded_for;
        }
    }
}

Run Script

Finally, let’s create a script, run.sh, which will run when the Heroku dyno starts.

export HEROKU_PORT=$(echo "$PORT")
export MLFLOW_PORT=5000

envsubst '$HEROKU_PORT,$MLFLOW_PORT' < /etc/nginx/sites-available/default/nginx.conf_template > /etc/nginx/sites-available/default/nginx.conf

htpasswd -bc /etc/nginx/.htpasswd $MLFLOW_TRACKING_USERNAME $MLFLOW_TRACKING_PASSWORD

killall nginx

mlflow ui \
  --port $MLFLOW_PORT \
  --host 127.0.0.1 \
  --backend-store-uri $(echo "$DATABASE_URL" | sed "s/postgres/postgresql/") \
  --default-artifact-root $S3_URI &

nginx -g 'daemon off;' -c /etc/nginx/sites-available/default/nginx.conf

There are a few things happening here: 1. We set the HEROKU_PORT environment variable from the randomly assigned PORT that Heroku creates on dyno startup 2. We assign MLFLOW_PORT to 5000. The chances that Heroku assigns exactly port 5000 are low. If you want, you can add a few more lines to first check if Heroku assigns 5000 as the port, and if it does, choose any other port instead (e.g. 1000). 3. We substitute the port placeholder variables in our nginx.conf_template file with the actual environment variable values for the ports, so that nginx knows where to listen and direct traffic. 4. We create a new .htpasswd file from the MLFLOW_TRACKING_USERNAME and MLFLOW_TRACKING_PASSWORD environment variables that we set in the Heroku config. This file will be used by nginx to check inputted usernames and passwords against. 5. We start the MLFlow UI in the background, telling it to run on the MLFLOW_PORT variable we created, and pointing it to our Heroku Postgres instance for the backend store and our S3 bucket for artifact storage. Note: By default, Heroku Postgres provides a URL that begins with postgres. This is not compatible with SQLAlchemy, so we substitute postgresql (which is compatible) for postgres. 6. We start nginx for basic auth.

Deploying

Deployment is simple, and can be done with a few lines of bash:

heroku login
heroku container:login
heroku container:push web --app my-mlflow-example
heroku container:release web --app my-mlflow-example

And that’s it! Running those four lines should build your Docker image, push it to Heroku, and release it as your app. Once it releases, Heroku will boot up a dyno and you should be able to go to my-mlflow-example.herokuapp.com and see the MLFlow UI.

Using MLFlow

Now that your instance is deployed, you should have no problem using MLFlow for your whole ML lifecycle. For example, in R you might want to create a new experiment. You could do something like this:

## install.packages("mlflow")

library(mlflow)

## Set up some environment variables
## This way, your R session will know where to
##  look for your MLFlow instance and will have
##  the proper credentials set up
Sys.setenv(
  MLFLOW_TRACKING_URI = "https://my-mlflow-example.herokuapp.com"
  MLFLOW_TRACKING_USERNAME = "YOUR-USERNAME",
  MLFLOW_TRACKING_PASSWORD = "YOUR-PASSWORD"
)

mlflow_create_experiment("my-first-experiment")

And with that, you should be able to harness all of the awesome power of MLFlow for all of your ML lifecycle needs!