Docker Desktop Filesystem Caching: Faster with Mutagen

I started down the Docker path for my local dev environment six years ago. As soon as an alpha version of Docker for Mac was available, I installed it to replace my boot2docker-based VM. I mentioned at the time that its one major drawback was performance of the osxfs filesystem. All these years later, and it’s still sluggish compared to the native filesystem.

I’ve tried various solutions to mitigate the issue. NFS volumes could give a minor performance boost for some applications, but its effects were negligible for my own dev experience. I tried docker-sync for a while, but constantly ran into problems with the sync lagging or stalling. Mutagen seemed similarly promising, but would run into the same issues.

Exciting News!

‘Twas with great delight that I read the announcement that the Docker Desktop team would finally be implementing a solution. That brings us, a couple months later, to today, where I’ve had the opportunity to test it out.

The syncing solution is built on top of Mutagen. Though I’ve had my issues with it in the past, I’m hopeful that the Docker Desktop team’s official blessing and support will help the tool become efficient, stable, and reliable. It took a little bit of troubleshooting to get to a working installation, so I thought it best to document my steps.

Configuration

To start with, the official “Edge” release of Docker Desktop for Mac is outdated (yup, that’s what I said). Instead, you can find links to newer versions in the GitHub forums. Today I’m running on build 45494, which I found in a discussion about excluding files from the sync. This build resolves two key issues that I ran into with the Edge release. First, it opens up file permissions on the synced files to resolve write permission errors. Second, it adds support for a global Mutagen config.

The Mutagen config is an essential tool for excluding certain files/directories from the sync. In my particular case, I don’t want my node_modules directories to sync. I use nvm and run my node commands on my host machine. Excluding these directories can cut a large chunk off of the synchronization time. So I created my config file at ~/.mutagen.yml with the following rules:

sync:
  defaults:
    ignore:
      vcs: true
      paths:
        - node_modules

Only after this file is in place can I configure caching according to the documentation. If you enable it beforehand, you’ll have to remove the directory from your config, restart Docker Desktop, and then re-add it.

Troubleshooting

I ran into some errors with symlinks in my project directory. Mutagen will complain and refuse to sync if there are absolute symlinks in the cached directory. Fortunately, I was able to remove them from my current projects. Otherwise, and option might have been to use the global config to ignore them.

The debugging output is not particularly helpful. When Mutagen encounters an error, all you get is an “Error” status in the File Sharing settings of Docker Desktop. Another comment in the forum showed me the proper path to viewing the error. The docker daemon’s HTTP API will show the state of the sync, along with any error messages (note that jq is here to make the output prettier).

curl -X GET --unix-socket ~/Library/Containers/com.docker.docker/Data/docker-api.sock http:/localhost/cache/state | jq

With all of my errors resolved, I can now start up my containers with the synced directories. The application performance is noticeably faster, with WordPress pages loading in a few hundred milliseconds instead of a few seconds. This shaved about 80-90% off of the total time to run my automated test suites under the osxfs mounts.

Docker Desktop file sharing settings

After a couple of days running, I haven’t seen any show-stopping issues with this new caching. Nice work, Docker Desktop team. I’m looking forward to watching this tool stabilize and improve.

Update (2020-07-07): The latest edge version of Docker Desktop makes this even simpler. By using the delegated mount strategy, Mutagen will be automatically enabled for the directory. According to the discussion, future versions will also allow one to disable the Mutagen caching for a directory by explicitly setting the consistent or cached strategy.

nginx as HTTPS proxy for Elasticsearch

Since Elasticsearch is exposed via an HTTP API, we can user our nginx server to proxy Elasticsearch requests using the HTTPS protocol.

Let’s say you have your local dev environment configured to use SSL. Your dev site is accessible at https://mysite.dev/. Wonderful! Now you need to add Elasticsearch to your project. Let’s add it to docker-compose.yml, something like:

version: "2"
services:
  elasticsearch:
    image: docker.elastic.co/elasticsearch/elasticsearch:5.2.2
    environment:
      - xpack.security.enabled=false
      - bootstrap.memory_lock=true
      - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
    ulimits:
      memlock:
        soft: -1
        hard: -1
      nofile:
        soft: 65536
        hard: 65536
      mem_limit: 1g
    volumes:
      - elasticsearchindex:/usr/share/elasticsearch/data
    ports:
      - "9200"
    network_mode: "bridge"
  # and some other services like PHP, nginx, memcached, mysql
volumes:
  elasticsearchindex:

How do you make requests to Elasticsearch from the browser?

Option 1: Set up a proxy in your app. This probably resembles what you’ll ultimately get in production. You don’t really need any security on Elasticsearch for local dev, but in production it will need some sort of access control so users can’t send arbitrary requests to the server. If you’re not using a third-party service that already handles this for you, this is where you’ll filter out invalid or dangerous requests. I prefer to let more experienced hands manage server security for me, though, and this is a lot of overhead just to set up a local dev server.

Option 2: Expose Elasticsearch directly. Since I don’t need security locally, I could just open up port 9200 on my container and make requests directly to it from the browser at http://localhost:9200/. Notice the protocol there, though. If my local site is at https://mysite.dev/, then the browser will block insecure requests to Elasticsearch.

Option 3: Use nginx as a proxy. I’m already using a reverse proxy in front of my project containers. It terminates the SSL connections and then passes through unencrypted requests to each project’s nginx server. The project’s nginx container doesn’t need to deal with SSL. It listens on port 80 and passes requests to PHP with fastcgi.

server {
	listen 80 default_server;
	server_name mysite.dev;
	# ... more server boilerplate
}

Since Elasticsearch is exposed via an HTTP API, we can create another server block to proxy Elasticsearch requests. First, make sure the nginx container can talk to the Elasticsearch container. In docker-compose.yml:

  nginx:
    image: nginx:stable-alpine
    environment:
      - VIRTUAL_HOST=mysite.dev,*.mysite.dev
    volumes:
      - ./nginx/default.conf:/etc/nginx/conf.d/default.conf:ro
      - ./nginx/elasticsearch-proxy.conf:/etc/nginx/conf.d/elasticsearch-proxy.conf:ro
      - ./nginx/php.conf:/etc/nginx/php.conf:ro
    links:
      - php
      - elasticsearch
    ports:
      - "80"
    network_mode: "bridge"

And then create elasticsearch-proxy.conf to handle the requests:

upstream es {
	server elasticsearch:9200;
	keepalive 15;
}

server {
	listen 80;
	server_name search.mysite.dev;

	location / {
		proxy_pass http://es;
		proxy_http_version 1.1;
		proxy_set_header Connection "Keep-Alive";
		proxy_set_header Proxy-Connection "Keep-Alive";
	}
}

Now we can make requests to Elasticsearch from the browser at https://search.mysite.dev/. The nginx proxy will handle the SSL termination, and communicate with Elasticsearch using its standard HTTP API.

Create and Trust Local SSL Certificate

Automate the creation of locally trusted SSL certificates for use with Docker-based development environments

I use Jason Wilder’s nginx reverse proxy container as the gateway to my various Docker dev environments. Among its other services, it provides SSL termination, so I don’t need to worry about configuring SSL in every container I run.

The set up is pretty simple. Make a directory of certificates and mount it into the container at /etc/nginx/certs. In docker-compose.yml, it would look something like:

version: "2"
services:
  proxy:
    image: jwilder/nginx-proxy
    ports:
      - "80:80"
      - "443:443"
    volumes:
      - ./nginx/certs:/etc/nginx/certs
      - /var/run/docker.sock:/tmp/docker.sock

You’ll need to create a new certificate for each domain you want to serve. Add them to the certs dir, and the proxy will find them and serve those domains with SSL.

I’ve created a script that will create the certificate and, on OS X at least, add it to your login keychain as a trusted certificate so you can avoid SSL warnings from your browser. Create the file create-cert.sh in your certs directory and run it from there. E.g., certs/create-cert.sh mysite.dev.

#!/bin/bash
 
 
CERTDIR=$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )
DOMAIN=$1
 
if [ $# -lt 1 ]; then
  echo 1>&2 "Usage: $0 domain.name"
  exit 2
fi
 
cd ${CERTDIR}
 
cat > ${DOMAIN}.cnf < <-EOF
  [req]
  distinguished_name = req_distinguished_name
  x509_extensions = v3_req
  prompt = no
  [req_distinguished_name]
  CN = *.${DOMAIN}
  [v3_req]
  keyUsage = keyEncipherment, dataEncipherment
  extendedKeyUsage = serverAuth
  subjectAltName = @alt_names
  [alt_names]
  DNS.1 = *.${DOMAIN}
  DNS.2 = ${DOMAIN}
EOF
 
openssl req \
  -new \
  -newkey rsa:2048 \
  -sha1 \
  -days 3650 \
  -nodes \
  -x509 \
  -keyout ${DOMAIN}.key \
  -out ${DOMAIN}.crt \
  -config ${DOMAIN}.cnf
 
rm ${DOMAIN}.cnf
 
if [[ $OSTYPE == darwin* ]]; then
  sudo security add-trusted-cert -d -r trustRoot -k $HOME/Library/Keychains/login.keychain ${DOMAIN}.crt
fi

Reload your proxy, and you can now visit https://mysite.dev/.

Reaching localhost from a Docker container

Docker Engine has some built-in networking features to allow containers to communicate with each other and with the outside world. For example you can link two containers together to allow them to talk to each other, either via the docker run command or in your docker-compose.yml.

version: "2"
services:
  memcached:
    image: memcached:1.4-alpine
  php:
    image: php:5.6-fpm
    links:
      - memcached
  nginx:
    image: nginx:stable-alpine
    links:
      - php
    ports:
      - "80"

In our example, the php container can communicate with the memcached container, the nginx container can communicate with the php container, and the nginx container exposes its port 80 to receive connections from, for example, your web browser.

In some cases, though, your container needs to be able to reach back out to your host system. The specific use case I have is debugging with Xdebug. Using Docker for Mac, there’s not a reliable address you can use to make that connection. A container that tries to connect to localhost or 127.0.0.1 will be talking to itself, not to your host machine.

To work around this, I set up an additional IP address for my host OS’s loopback interface. I use 10.254.254.254, but you may find it conflicts with other applications you have running, so any other local address will be effective.

The command to set up this address is:

sudo ifconfig lo0 alias 10.254.254.254

Running this will allow any container you have running to connect back to your host OS using the address 10.254.254.254. You can, for example, set this in your php.ini as the remote host address for Xdebug (I prefer to set it in a reverse proxy configuration, but that’s a topic for a later post).

xdebug.remote_host=10.254.254.254

Verify that it worked with:

ifconfig lo0

Your new address should appear there along with a few other defaults.

You’ll have to run the command after every boot, unless you set a launchd script to do it for you. There are a few versions of the launchd plist file floating around, or you can make your own from one of them. Copy it into /Library/LaunchDaemons/ and your new loopback address will be set for you on every boot.

Data Persistence with Docker Volumes

Docker containers are designed to be ephemeral. You can destroy one and spin up an exact replica in seconds. Everything that defines the container can be found in the Dockerfile that declares how to build it.

This model does not, however, explain what to do with persistent data. Things like databases or uploaded media. In a production environment, I would recommend delegating these tasks to an external service, like Amazon’s RDS or S3.

For local development, though, you can use volumes for storing persistent data. Volumes come in two main flavors: data volumes and host directory mounts.

The latter is perhaps the most straightforward. You connect a directory in your container to a directory on your host machine, so they are essentially sharing the file system. Indeed, when you’re actively working on code, this is the simplest way to share your local code with your running containers. Mount the root directory of your project as a volume in your container, and anytime you update code, your container will also have the updates.

# docker run --rm -it -v="/your/local/dir:/srv/www/public" nginx:stable-alpine /bin/sh

This runs a container that has its /srv/www/public directory shared with your host system’s /your/local/dir directory. Updates you make to files either on your local system or in the container are automatically shared with the other.

Data volumes do not map directly to your host filesystem. Docker stores the data somewhere, and you generally don’t need to know where that is. When using Docker for Mac, one of the key differences is that a host mount shares files using osxfs (which currently has some performance issues) while a data volume stores its data inside the Docker virtual machine (which is subsequently much more performant for I/O). While I use host mounts for things like uploaded media, I prefer to use a data volume for storing databases.

# docker volume create --name=mysqldata

Once the volume is created, you can mount it into one or more containers.

# docker run --rm -v="mysqldata:/var/lib/mysql" mysql:5.5

The contents of our volume “mysqldata” will be available to MySQL in the /var/lib/mysql directory. The data volume itself doesn’t have a directory name (in contrast to the prior best practice of using a directory within a data-only containers). I think of a volume as a single directory that can be mounted wherever I want in a container.