I currently have a Java application which generates a set of Log4j2 log files (RollingFile appender) which is harvested by Filebeat (and sent on to Logstash and ES in the usual way). This all works fine.
But now there is a desire to run the Java application in a Docker container (using docker-compose, at least for the time being).
What is the best approach? One option, which sounds the least disruptive to the existing code and architecture, would be
use a volume to write the application's log files to a host directory, continuing to use log4j2 and the RollilngFile appender
run Filebeat on the host in the usual way (nobody has told me that Filebeat has to run under Docker)
but there are no doubt many others, so I'm interested in how others have done this.
(I am, I hope, reasonably familiar with Java, log4j2 and the Elastic stack. I know very little about docker and docker-compose.)
Oh, and a related question (which isn't really an Elastic question): it seems reasonable to want to include the log4j2.xml file in the container (for ease of deployment in the usual case), but be able to override it at run time (eg to wind up the logging level for debugging). Is there a conventional best practice way of doing this?
The recommended way to do logging in Docker is to send your logs to stdout, then Docker takes care of handling them.
Filebeat is able to understand default Docker's json-file format. In a nutshell: Docker writes logs for all running containers under /var/lib/docker/containers/. You can use these Filebeat settings to fetch and process them:
You can run Filebeat as a Docker container if that makes things easier, you just need to make sure you bind mount the following paths into the container:
/var/lib/docker/containers to allow Filebeat access the logs.
/var/run/docker.sock to give access to the Docker daemon (for metadata fetching).
Ta. I'd come across this suggestion, but isn't the problem that other stuff might go to stdout as well as my logs? Which could make processing the logs further downstream something of a pain (eg having to handle grok parse filters in logstash) and, I gather, would make putting multiline log entries back together (specifically Java stack traces) also something of a pain?
You can match image names to decide on multiline settings, or for instance, use docker labels to inject your multiline settings to Filebeat. We plan to do this a feature in 6.3.
My concern over multiline handling (which I've got working fine for the non-dockerised version of the application) was: what if multiple things within the same container write to stdout, interleaving their lines, some in my standard Java log format and others in whatever random format someone else chose to write to stdout or stderr, there's surely no way the Filebeat multiline handling can cope with that? (Having said which I haven't seen any instances of this happening yet.)
I'm aware of the Docker metadata features which I will investigate in due course, but I was under the impression they were a 6x feature and I haven't yet found the time to upgrade from 5.6.
I've now looked at having the application log to a Console appender, and then picking up the Docker log from outside the container with Filebeat.
But what I have observed so far is that when the container terminates the directory
/var/lib/docker/containers/
which contains the log file(s) vanishes. Doesn't this mean that there's a fair chance that the log file will be deleted before Filebeat has read the last few lines of the file which are the ones which tell you why the container crashed in the first place? So at first sight this approach doesn't look useful - what am I missing?
Ah, OK, that was because I stopped the container with "docker-compose down". If I just change the application to crash with an NPE the log doesn't get deleted so that looks OK.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.