We think about using an Elastic (ELK) stack to process logs from our servers. We have about ten virtual Ubuntu servers that run on prem. All of those servers run some docker images, currently orchestrated by docker-compose.
I have now a quick and dirty proof of concept running. It collects logs from some of the containers using a container
input in a containerized filebeat, transforms the data in containerized logstash and finally ships the processed data to elasticsearch.
Now I would like to parse the logs into Elastic Common Schema (ECS). However I don't know how to do that propery.
- How do I tell Filebeat (or Logstash?) that container X produces nginx logs? Should I just e.g. parse the
container.image.name
? - After knowing that container X produces nginx logs: how do I parse those logs to ECS fields? Do I need to implement my own grok filters or can I reuse some existing solution? (If so: what solution?)
- What is the proper way for populating fields like
host.name
? (Remember, Filebeat is containerized)- Probably related: I currently add a tag (literal) containing the hostname of the Ubuntu server in filebeat, so that I can distinguish between servers when feeding into my multipipeline logstash. Is this the correct way to distinguish between servers?
- I would like to only keep fields that I actually deem useful. What is the best practise to do that? A logstash
prune
filter with ablack(/white)list_names
task?
I am mainly interested in concepts and answers like "use feature X of Filebeat" or "don't use multipipeline because...". I can already build my proof of concept somehow . I would now like to learn how it would be done properly. Links and redirects are very much appreciated.