I have just set up an ELK stack (on Centos 6.7). I have Filebeat set up to harvest files from an application server, but I'm trying to figure out what the current best practices are for shipping/parsing/searching on log4j data, since that will be our primary usage of ELK, at least at first.
I am able to get the logs coming in, and see data in Kibana, but since I've not really taken any further steps, it seems like some of the data is missing (for example, thread id) and multiline is not working, although I don't find this all that surprising, since I took no further steps (yet) besides enough config to have the files get harvested via Filebeats, sent to Logstash (with Filebeats plugin), then to ES just to see something end-to-end.
If someone is starting with ELK 6, what would be the "best" path forward?
I'd say you have to figure yourself what exactly you want to learn from logs. Some users are fine with raw logs, while others want full parsing + building dashboards computing metrics based on the logs. Plus log4j is quite configurable.
The next steps would be to configure multiline in filebeat + logstash, to do some processing on the the logs. In filebeat you want to configure multiline to capture stack-traces + in Logstash use grok/dissect todo parsing. If IPs are included, maybe add geoip. Also add some more data cleanup e.g. replace constants (e.g. error codes) with actual names/messages.
log4j can already create JSON documents. So instead of implementing parsers and filters in logstash, consider your log output to be in json. (see JSON Layout: https://logging.apache.org/log4j/2.x/manual/layouts.html). For filebeat to process these json document, use compact=true and eventEol=true in your log4j config. Both, filebeat and logstash support JSON parsing. if you don't need to filter events or remove fields before publishing to Logstash (in order to safe some bandwidth) in filebeat, consider the JSON parsing in Logstash, to safe some CPU/memory in filebeat.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.