Docker and Logstash: Smarter Log Management For Your Containers
Docker currently supports getting logs from a container that logs to stdout/stderr.
Everything that the process running in the container writes to stdout
or stderr
docker will
convert to json
and store in a file on the host machine's disk which you can then
retrieve with the docker logs
command.
This is handy but it has its drawbacks because you don't get any log rotation and the file
size of the collected logs can become an issue as it eats up your host's disk space. Not to mention
the fact that every time you run docker logs container_id
you get all the logs of that processes from
the beginning.
While there are some interesting things being discussed on the docker-dev mailing list, I wanted to see if I could get docker to play along with proven logging systems out there with the functionality I have now.
First things first, a couple of requirements:
- I don't want to run multiple processes in a container (syslog + my_process or /sbin/init)
- I don't want to have to configure the host to keep track of docker logs
If I run the container in a so called "machine mode", obviously I can leverage the tools a full blown system provides (such as syslog). I didn't want to do this because it's not the docker way and I want a separation of concerns meaning every process/service running in it's own container.
The reason for #2
is because I don't want to have to do heavy duty host setup when all that's needed is docker installed
so I can run everything in a container. Think of host machines as throw-away cloud instances that
you provision, run a few services on them (isolated in containers) and just throw them away when done.
The first solution that came to mind was to use bind mounts
and mount the host's /dev/log
inside the
container and just have the container log to that. This way I could aggregate the logs on the host and
possibly ship them somewhere to a central location. Although a viable solution, I didn't quite care for it
as it still meant needing to properly configure syslog
on the host server.
To be fair, I could have used a single syslog container and mount it's volumes to all the containers that needed
to log stuff, but I kind of wanted to try out something different than syslog.
I heard great things about Logstash and Elastisearch so I wanted to try this solution out. Especially since I've seen the new Kibana web interface for logstash.
To quote their website: "Logstash is a tool for managing events and logs. You can use it to collect logs, parse them, and store them for later use (like, for searching). Speaking of searching, logstash comes with a web interface for searching and drilling into all of your logs."
Getting Logstash up and running was fairly trivial and I've prepared a Dockerfile so you can get started quickly. I'm using the embedded elasticsearch server but you can run you own server on a different machine and just pass the IP and Port to the logstash container.
Now, the logstash service is kind of heavy on the resources so I didn't want to run a logstash container on every host machine but rather I wanted co collect logs from every container on every host machine and send it to a central logstash server. That's where logstash-forwarder comes in.
Logstash-forwarder, previously know as lumberjack
is used to collect the logs from each and every container on a host machine and send them to a central logstash server (or multiple servers). I've prepared a Dockerfile here.
Logstash-forwarder can be configured to watch certain directories and files but I was more interested in the ability
to listen on stdin
.
So the idea is to run logstash-forwarder
on every host machine and expose a named pipe (FIFO) as a volume that
other containers on that host can mount and write to.
So first thing's first, let's run the logstash container (see github link above for how to build it):
docker run -name logstash -p 9292:9292 -d -t logstash
This will allow access to the Kibana web interface on localhost:9292
.
Now let's run the logstash-forwarder
container (again, see github link above for how to build it):
# replace the IP with the actual IP of the logstash container docker run -name forwarder -d -v /tmp/feeds -e LOGSTASH_SERVER="172.17.0.69:5043" -t forwarder
Now all we would need to do to run a service that would write to the forwarder
is:
docker run -volumes-from forwarder -i -t ubuntu /bin/bash -c "echo 'test' >> /tmp/feeds/fifofeed"
If you go to the Kibana web interface you should see that the message got through.
We could just as easily build containers using Dockerfiles where we specify a CMD
or ENTRYPOINT
directives like so:
# My App # # VERSION 0.0.1 FROM ubuntu MAINTAINER Me "me@email.com" CMD /usr/local/bin/run_my_app.sh >> /tmp/feed/fifofeed
The downside of this is that we redirect all output to the FIFO pipe and therefore we short circuit
the docker logs
command as it will get no output anymore. But this is fine as it's much better to have
logs on a central location and not worry about the logfile filling up the disk space on the host machine.
[UPDATE on April 10th 2014]:
As was pointed out to me in the comments (thanks Alan) a named pipe will blow up just the same way as an anonymous pipe would if there is no reader at the reading end. This means that even if my application handles the appropriate signal and reopens the pipe if there is no one on the reading end it would block.
So I did a little change to the architecture explained above: I mount /dev/log
from the host system into
the app container, that way the process in the container can be set up to log to syslog, which in this case
is going to end up on the host's syslog. After that I just have another container that runs logstash-forwarder
(that also mounts /dev/log from the host) and ships the logs off to the logstash server.
The benefit of this approach is, again, that I don't have to do much setting up on the host seeing as every distro comes with some kind of syslog daemon already set up (mostly rsyslogd these days).