Haskell, Travis, Heroku and Docker – oh My!

Haskell is a great language to write code in – for all the reasons no-one needs to list here. Its combination of concision, safety and performance is a delight to work with. What is less delightful is the deployment experience, and that is hardly surprising: it is a compiled language, so build times can be frustratingly long, and only the most generous observer would call it mainstream, so little work has really gone into optimising the deployment cycle for Haskell on modern PAAS devops systems. While every CI system and PAAS worth its salt has a 5-minute getting started tutorial for JavaScript and Ruby, the adventurous modern Haskeller is often out on a limb when it comes to finding good tools and simple guides that provide the kind of devops experience others take for granted.

The problems

We have a fairly common set-up for a modern server-room-less start-up: everything hosted in the cloud; continuous integration (on Travis-CI) and continuous deployment (on Heroku) of tested code; zero-time deploys and automatic scaling. And a very pleasant world it is. Into this world we now are looking to bring in our first Haskell micro-service, to handle CPU-bound tasks under high-load. It has been a great choice, and the performance is frankly astonishingly good, and is making it possible to do things were struggling to scale to until recently.

However, running a vanilla cabal based build on Travis for an even moderately complex Haskell project make for excruciatingly slow builds; not because of compilation itself, but simply because even pulling in a small number of (fairly standard) cabal dependencies will have the poor Travis build server churning away for 15 minutes as it tries to build half of Hackage. Every time. 20-30 minute build times on Travis do rather reduce the “continuous” nature of any CI system. On top of that, having slow running builds blocks our build queue for all the other developers, which in turn blocks deployment, and when you deploy several times a day over dozens of services, that is a big problem.

And even when you have run the tests, you have an even bigger problem deploying to Heroku. Heroku will kill builds that take longer than about 30 minutes to run, so they won’t even deploy. And you certainly can’t rely on cabal run to do the build for you, since that would mean massive down-time every time you release. In fact it wouldn’t even mean that – Heroku kills any release that doesn’t bind to its assigned port within a reasonable time from launch (about 1 minute); this means the service must be built before launch, and it probably cannot be built as part of the release process.

Solutions

We don’t claim to have definitive solutions for all this right now – we are still iterating but we think we have found something that works well right now, after trying a couple of different tools.

Our first approach was to use Halcyon, a system designed for managing Haskell on Heroku. And it works. No question. You can deploy even large complex systems on Heroku without busting your build budget. But there is a catch: it achieves this feat by not building the app, and instead deploying an empty slug after scaling down the app to no dynos. You then have to launch a one-off build dyno to compile the application, after which you can restart the app and use the compiled slug. Thankfully you don’t have to go through a long rebuild on every release, only if the sandbox itself has changed. However, since most of our logic was in a library wrapped in a thin web-service wrapper, the library is the most volatile part, and this release procedure doesn’t save us much. Even worse this system is not suitable for continuous deployment, since you don’t know if the release will work when you do it, and if a rebuild is needed, you have a dead service while you perform the manual rebuild. So much for zero downtime.

A lot of the problems with this were mitigated by using Heroku pipelines, which does not perform builds, and instead just reuses existing slugs. This means we can build on staging, verify the deployment and just click promote and get instantaneous zero-time deployment, as needed. Magic.

This still leaves the issue of unattended builds stopping us from using continuous deployment. The solution for this was to drop the Halcyon approach altogether, and use Heroku’s Docker support for deployment. This comes at the cost of a bit of extra complexity in the project itself, as you have to be completely explicit about the build procedure in the form of the Dockerfile. On the other hand, the Dockerfile then serves both as a reproducible build environment and as comprehensive documentation about how to build and run the system, which is invaluable. And even better, it is executable documentation, which is the very best kind. We get Travis to act as a build server, building and deploying the Docker image in an after_success hook executed when the tests pass. This has the additional benefit of confirming that our code, and specifically the Dockerfile, is in a deployable state. Docker based deployments are instant, and have no build phase on Heroku, so we offload the pain of compilation to Travis, where we have to compile anyway, and get painless deployment in return.

This leaves us with the issue of long builds on Travis, which as mentioned is more than a nuisance – it can actually block others from getting their work done. Thankfully the solution here is pretty straightforward: cache the build. Initially we used Halcyon for this as well, since the whole point of this is that it produces reliable, reproducible builds which it uploads to your private S3 bucket and then reuses them if your dependencies are unchanged. However, the only dependencies it takes into account are your project dependencies, not your test dependencies, so while you can amortise your build time cost by 50%, you can’t eliminate it altogether, well not if you want fancy things like hspec.

Once we stopped using Halcyon on Heroku, this lead us to rethink caching on Travis. As usual, simpler is better, and Travis supplies its own caching solution for private builds: caching the sandbox brings a 30 minute build down to 3 minutes, and we only ever have to rebuild what has changed, not the entire sandbox. Much like the use of Heroku-Docker comes at the cost of increased explicitness, abandoning Halcyon means a more verbose .travis.yml file, since if we want nice things like specific GHC versions we have to handle it ourselves.

Enough Talk! Show Me Code!

So now we have a Travis build file that installs its own versions of GHC and caches the sandbox:

# We will manage the build system ourselves, since travis doesn't
# have a wide range of GHC versions.
language: sh

# We need docker for managing deployment, thankfully Travis provides this as a service
services:
  - docker

# We need sudo, both for installing stuff, and to use docker.
sudo: required

# Pin dependencies to these versions.
env:
  global:
    - CABALVER=1.22
    - GHCVER=7.10.1
    - HAPPYVER=1.19.5
    - ALEXVER=3.1.4
    - secure: "...redacted..." # HEROKU_API_KEY, used in after_success hook.

# Cache the sandbox so that we get faster builds.
cache:
  directories:
    - '.cabal-sandbox'

# We use the popular hvr/ghc PPA to install pinned versions of dependencies
before_install:
  - travis_retry sudo add-apt-repository -y ppa:hvr/ghc
  - travis_retry sudo apt-get update
  - travis_retry sudo apt-get install cabal-install-$CABALVER ghc-$GHCVER
  - export PATH=/opt/ghc/$GHCVER/bin:/opt/cabal/$CABALVER/bin:$PATH
  - travis_retry sudo apt-get install alex-$ALEXVER
  - export PATH=/opt/alex/$ALEXVER/bin:$PATH
  - travis_retry sudo apt-get install happy-$HAPPYVER
  - export PATH=/opt/happy/$HAPPYVER/bin:$PATH
  - cabal update

# Create and build the sandbox. We have a private dependency, so we add it
# to the sandbox here.
install:
  - cabal sandbox init && cabal sandbox add-source MY_DEPENDENCY
  - cabal install --only-dependencies --enable-tests
  - cabal configure --enable-tests
  - cabal build

# Bog-standard test script.
script:
  - cabal test

# When we have succeeded, then release to staging. Then, when QA'ed,
# the staging app will be promoted to production.
after_success:
  - .travis/after_success.sh

# Don't build every commit - only PRs and when new code lands in Master
branches:
  only:
      - master

And then we deploy to Heroku in our after_success hook:

#!/bin/bash

set -e # Abort script at first error
set -u # Disallow unset variables

# Only run when not part of a pull request and on the master branch
if [ $TRAVIS_PULL_REQUEST != "false" -o $TRAVIS_BRANCH != "master" ]
then
    echo "Skipping deployment on branch=$TRAVIS_BRANCH, PR=$TRAVIS_PULL_REQUEST"
    exit 0;
fi

# Install the toolbelt, and the required plugin.
wget -qO- https://toolbelt.heroku.com/install-ubuntu.sh | sh
heroku plugins:install heroku-docker

# Build and release the application.
# To give access to your Heroku apps, you
# need to set the HEROKU_API_KEY environment variable.
heroku docker:release --app "$STAGING_APP"

And then the final element of this arrangement is the Dockerfile, which specifies the build process:

# We derive our build from the Heroku base image.
FROM heroku/cedar:14

# Stuff taken form the Dockerfile for the `haskell:7.10` base image. #########
## ensure locale is set during build
ENV LANG C.UTF-8
ENV GHCVER 7.10.2
ENV CABALVER 1.22

# Install nodejs (for client side code) and GHC (for the server)
RUN apt-get update && apt-get install -y --no-install-recommends software-properties-common
RUN add-apt-repository -y ppa:chris-lea/node.js
RUN add-apt-repository -y ppa:hvr/ghc
RUN apt-get update
RUN apt-get install -y --no-install-recommends \
        python-software-properties git build-essential \
        nodejs \
        cabal-install-$CABALVER ghc-$GHCVER \
        happy-1.19.5 alex-3.1.4 \
        zlib1g-dev libtinfo-dev libsqlite3-0 \
        libsqlite3-dev ca-certificates g++ && \
    rm -rf /var/lib/apt/lists/*

ENV PATH /root/.cabal/bin:/opt/cabal/1.22/bin:/opt/ghc/7.10.2/bin:/opt/happy/1.19.5/bin:/opt/alex/3.1.4/bin:/app/.cabal/bin:$PATH

# We are obligated to build within the /app directory, so make this our user's home.
RUN useradd -d /app -m app
USER app
RUN cabal update

WORKDIR /app

ENV HOME /app
ENV PORT 3000

# Heroku builds the slug here
RUN mkdir -p /app/heroku
# We build the server here
RUN mkdir -p /app/src
# We run the app from here.
RUN mkdir -p /app/user
RUN mkdir -p /app/.profile.d

##############################
# Build and install the server
##############################

WORKDIR /app/src

COPY LICENSE LICENSE
COPY cabal.config cabal.config
COPY server.cabal server.cabal
COPY Setup.hs Setup.hs

# Install application dependencies
RUN cabal install --only-dependencies

# Main src code
COPY src src
# The main entry point for the server
COPY app app
# Any resources needed at build time.
COPY resources resources

# Build the server. We must make sure the executable is located under /app/
RUN cabal install; cp /app/.cabal/bin/server /app/server

#################################
# Install run time dependencies
#################################

WORKDIR /app/user

# resources contains all the HTML, JS and CSS files needed, so make it available
# there at run time.
RUN cp -r /app/src/resources /app/user/

# Add files NPM needs to run.
COPY package.json package.json
COPY .bowerrc .bowerrc
COPY bower.json bower.json

# Install and build any client side dependencies
RUN npm install

A the end of this we have an app we can develop and run as normal (using cabal build|test|run), either locally or in docker, and have it tested and deployed continuously, with zero-downtime. Win!

blog comments powered by Disqus