Hi, I've been reading a number of articles as I try to put a design together for log aggregation. My situation is that I have a relatively small (no huge traffic loads) distributed application. The overall app will be running docker containers (2 running spring boot java apps, one running mongo, one rabbit mq, and finally a small python app). Oh, and I have a container running ELK(sebp/elk).
I've read myself into confusion as it pertains to deciding if I need filebeat, logstash both? Should I be reading the docker logs or the individual services logs? Before going down the wrong path I wanted to get some advice.
Logstash takes the data it is given and parses/enriches the data. Delivery of the data to Logstash is performed by a logstash input or an agent, like FileBeat. Logstash has numerous input plugins officially supported and many more unsupported ones.
As far as the logs you "should" be looking at, well that is up to you. There might be a docker plugin somewhere that can pull/query the logs but I am not familiar enough with that to really be of any assistance. If the logs you want to ingest are in a file somewhere, FileBeat would definitely be a potential solution.
Is life made simpler if I configure my apps to output log information in a specific format (JSON, other)? Right now the logs are being generated by Spring's default log patterns, which appears to mean(unless I'm wrong) that I would need to build some gronk filters in Logstash to separate the parts of the message, is that true?
I'm unfamiliar with what Spring is. Format is really up to you, Logstash's ability to interpret them, and your ability to configure Logstash to interpret them properly, lol. The filter plugin section of the Logstash reference documentation has numerous plugins that will read a variety of format standards, such as JSON, CSV, XML, etc...
Logstash is heavy duty, it can import (direct from log file(s) or via a "beat(s)" or from one of the many dedicated input modules), manipulate that data to your hearts content and output with various output plugins.
The beats like filebeat are lightweight and concentrate on one thing, input and output with some very limited data manipulation possible.
The ideal scenario for a large server farm would be to have the lightweight beats like filebeat running on each server all feeding to a dedicated Logstash instance/cluster to do the hard work. But for fewer servers with excess resource running Logstash on each server itself is sufficient. My Logstash instances take up anywhere between 500~800Mb and 2~3% CPU.
Logstash can output as JSON (many of the outputs default is JSON) and if there is no specific input filter for your data source and you do not want to re-code it then just use Grok to structure it before leaving the default output as JSON.
Thanks, this helps. I think where I keep confusing myself is how to deal with docker. I see some articles where the "docker logs" are what is shipped to ELK vs. other examples where each container, from within the container, pushes information to ELK. I think I keep going back and forth on the two approaches and any thoughts/guidance would be helpful.
Also of note, when it comes to pushing the actual docker log, I'm finding that there are differences based on where docker is running (I do dev on Mac but will be pushing to linux) which I don't really like - but again I could very well be missing a better approach.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.