High level concepts of elastic stack in containerized environment

We think about using an Elastic (ELK) stack to process logs from our servers. We have about ten virtual Ubuntu servers that run on prem. All of those servers run some docker images, currently orchestrated by docker-compose.

I have now a quick and dirty proof of concept running. It collects logs from some of the containers using a container input in a containerized filebeat, transforms the data in containerized logstash and finally ships the processed data to elasticsearch.

Now I would like to parse the logs into Elastic Common Schema (ECS). However I don't know how to do that propery.

  • How do I tell Filebeat (or Logstash?) that container X produces nginx logs? Should I just e.g. parse the container.image.name ?
  • After knowing that container X produces nginx logs: how do I parse those logs to ECS fields? Do I need to implement my own grok filters or can I reuse some existing solution? (If so: what solution?)
  • What is the proper way for populating fields like host.name ? (Remember, Filebeat is containerized)
    • Probably related: I currently add a tag (literal) containing the hostname of the Ubuntu server in filebeat, so that I can distinguish between servers when feeding into my multipipeline logstash. Is this the correct way to distinguish between servers?
  • I would like to only keep fields that I actually deem useful. What is the best practise to do that? A logstash prune filter with a black(/white)list_names task?

I am mainly interested in concepts and answers like "use feature X of Filebeat" or "don't use multipipeline because...". I can already build my proof of concept somehow . I would now like to learn how it would be done properly. Links and redirects are very much appreciated.

Hi @siplsag, welcome to discuss :slight_smile:

Take a look to the autodiscover docs. Autodiscover helps configuring dynamic environments, it has a docker provider that can detect docker containers and apply a config for them. There are two main ways of providing this configuration: using config templates, or using hints.
Config templates are defined in the autodiscover configuration, while hints can be defined as labels in your containers.
You can use Filebeat modules in your configuration, that already include parsing pipelines for common log formats, as nginx.

Filebeat modules already follow ECS. So if you use the nginx module to parse your logs, they will be ECS-compliant.

If you need to define your own processors or ingest pipelines, is up to you the fields to use, there the recommendation would be to use ECS fields when possible.

This is a good question, because it is still a challenging topic. As you have probably seen, filebeat fills this field with the random name of the container, what is probably not what you expect. Some things you can try:

  • Use the name setting to override the values identifying the agent.
  • Run filebeat in the host network, so it has the same hostname as the host.

Yes, tags or custom fields can be used to distingish between group of servers. For example servers on different datacenters or regions. But to distinguish specific agents, it'd be better to rely on agent.name or host.name.

Take a look to filebeat processors, they can be used to alter the fields in the delivered events, including droping fields or whole events.

As a general recommendation, when possible, send events directly from Beats to Elasticsearch, this is enough for most of the use cases and simplify deployment. Even if more complex architectures using logstash or kafka are possible, they should be only used if needed, as they are more complicated to maintain.

1 Like

Thank you very much for your time. I will need a few days until I can apply your suggestions. But I am confident that your info gives me a nice push in the right direction.

I consider the question answered to my satisfaction. :+1:t2:

1 Like