2014-02-03

Docker and Logstash: Smarter Log Management For Your Containers

Docker currently supports getting logs from a container that logs to stdout/stderr. Everything that the process running in the container writes to stdout or stderr docker will convert to json and store in a file on the host machine's disk which you can then retrieve with the docker logs command.

This is handy but it has its drawbacks because you don't get any log rotation and the file size of the collected logs can become an issue as it eats up your host's disk space. Not to mention the fact that every time you run docker logs container_id you get all the logs of that processes from the beginning.

While there are some interesting things being discussed on the docker-dev mailing list, I wanted to see if I could get docker to play along with proven logging systems out there with the functionality I have now.

First things first, a couple of requirements:

  • I don't want to run multiple processes in a container (syslog + my_process or /sbin/init)
  • I don't want to have to configure the host to keep track of docker logs

If I run the container in a so called "machine mode", obviously I can leverage the tools a full blown system provides (such as syslog). I didn't want to do this because it's not the docker way and I want a separation of concerns meaning every process/service running in it's own container.

The reason for #2 is because I don't want to have to do heavy duty host setup when all that's needed is docker installed so I can run everything in a container. Think of host machines as throw-away cloud instances that you provision, run a few services on them (isolated in containers) and just throw them away when done.

The first solution that came to mind was to use bind mounts and mount the host's /dev/log inside the container and just have the container log to that. This way I could aggregate the logs on the host and possibly ship them somewhere to a central location. Although a viable solution, I didn't quite care for it as it still meant needing to properly configure syslog on the host server. To be fair, I could have used a single syslog container and mount it's volumes to all the containers that needed to log stuff, but I kind of wanted to try out something different than syslog.

I heard great things about Logstash and Elastisearch so I wanted to try this solution out. Especially since I've seen the new Kibana web interface for logstash.

To quote their website: "Logstash is a tool for managing events and logs. You can use it to collect logs, parse them, and store them for later use (like, for searching). Speaking of searching, logstash comes with a web interface for searching and drilling into all of your logs."

Getting Logstash up and running was fairly trivial and I've prepared a Dockerfile so you can get started quickly. I'm using the embedded elasticsearch server but you can run you own server on a different machine and just pass the IP and Port to the logstash container.

Now, the logstash service is kind of heavy on the resources so I didn't want to run a logstash container on every host machine but rather I wanted co collect logs from every container on every host machine and send it to a central logstash server. That's where logstash-forwarder comes in.

Logstash-forwarder, previously know as lumberjack is used to collect the logs from each and every container on a host machine and send them to a central logstash server (or multiple servers). I've prepared a Dockerfile here.

Logstash-forwarder can be configured to watch certain directories and files but I was more interested in the ability to listen on stdin.

So the idea is to run logstash-forwarder on every host machine and expose a named pipe (FIFO) as a volume that other containers on that host can mount and write to.

So first thing's first, let's run the logstash container (see github link above for how to build it):

docker run -name logstash -p 9292:9292 -d -t logstash

This will allow access to the Kibana web interface on localhost:9292.

Now let's run the logstash-forwarder container (again, see github link above for how to build it):

# replace the IP with the actual IP of the logstash container
docker run -name forwarder -d -v /tmp/feeds -e LOGSTASH_SERVER="172.17.0.69:5043" -t forwarder

Now all we would need to do to run a service that would write to the forwarder is:

docker run -volumes-from forwarder -i -t ubuntu /bin/bash -c "echo 'test' >> /tmp/feeds/fifofeed"

If you go to the Kibana web interface you should see that the message got through. We could just as easily build containers using Dockerfiles where we specify a CMD or ENTRYPOINT directives like so:

# My App
#
# VERSION               0.0.1

FROM      ubuntu
MAINTAINER Me "me@email.com"

CMD /usr/local/bin/run_my_app.sh >> /tmp/feed/fifofeed

The downside of this is that we redirect all output to the FIFO pipe and therefore we short circuit the docker logs command as it will get no output anymore. But this is fine as it's much better to have logs on a central location and not worry about the logfile filling up the disk space on the host machine.

[UPDATE on April 10th 2014]:

As was pointed out to me in the comments (thanks Alan) a named pipe will blow up just the same way as an anonymous pipe would if there is no reader at the reading end. This means that even if my application handles the appropriate signal and reopens the pipe if there is no one on the reading end it would block.

So I did a little change to the architecture explained above: I mount /dev/log from the host system into the app container, that way the process in the container can be set up to log to syslog, which in this case is going to end up on the host's syslog. After that I just have another container that runs logstash-forwarder (that also mounts /dev/log from the host) and ships the logs off to the logstash server.

The benefit of this approach is, again, that I don't have to do much setting up on the host seeing as every distro comes with some kind of syslog daemon already set up (mostly rsyslogd these days).

docker logs logstash elasticsearch

Comments powered by Disqus