Design and implementation dimensions on pipelines and indices

I'm fairly new to Elastic Stack and try to setup a lab environment for Proof-of-Concept. I've watched almost every Getting Start video on de Elastic website and read documentation thorougly on installing my setup. So I know the components and config used.

I've managed to have a working setup with Filebeat (system and apache2 modules), Logstash, Elasticsearch and Kibana. Each of the components is working, and I can discover the harvested logs in Kibana.

But, there is one thing I can't understand even after testing and reading documentation for hours now. And that is the dimensions in which Logstash Pipelines and Elasticsearch Indices relate to eachother. I have a working log flow, but I don't understand the dimension to scale in when it come to the complexity / scope of logging.

Situation: I've enabled the Filebeat modules system and apache2. I've got one beats.conf with input type beats, and output to elasticsearch. When I do no grok filtering, my apache log messages are not split. But when I configure grok filtering, the system logs are tagged with grokparsefailure. I understand the nothing vs. all situation here. But I can't get my head right on choosing the right scaling directions from this point on. And I can't find documentation on this matter.

Questions I have:

  1. With multiple enabled filebeat modules, do I need multiple Logstash pipelines (on different ports)?
  2. With multiple enabled filebeat modules, do I need multiple Elasticsearch Indices to store the different filebeat modules seperatly?
  3. Can anybody explain this in a logical manner like each beat module requires a custom Logstash pipeline? Or each pipeline requires a custom Elasticsearch Index?

Or can anybody point out a piece of documentation on these dimenions in designing pipelines and indices?

Thanks in advance!

Welcome!

  1. You can do that, or you can add a tag based on the source or beatname, then from there do conditional filtering based on the tag. For example if you add a "beat name" to the config on your heats then you could do the following:

     filter {
             if ([beat][name] == "your-beat-name-1") {
                     grok {
                     }
             }
    
     if ([beat][name] == "your-beat-name-2") {
                     mutate {
                     }
             }
     }
    
  2. Again, it depends how you want to operate. You can index everything into one index or you can split them up. It is entirely up to you. I prefer to split them up personally and that's how I do it. Again you can add tags based on the above example then in your output do conditional output to elastic based on the tags and specify an index.

Thanks for your reply!

Can you explain your choices/decisions on splitting up indices? What is your reason to split?

  1. Do you split each Logstash Pipeline into a different index?
  2. Do you split each Filebeat Module into a different index?
  3. Do you split up based on corresponding sources (i.e. all Apache logs, whatever way the pass to Elasticsearch)?

The reason I prefer to split is that I like having more customisation like the ability to retain some indexes longer than others. For example if you lump say filebeat, metricbeat and heartbeat all into a single "beat" index, then you can only keep that index for one specified period of time, i.e 30 days.

If you split them into heartbeat-, filebeat- and metricbeat-* then I could set heartbeat to be kept for 7 days, filebeat for 60 days and metricbeat for 30 days for example.

  1. Again it depends, I keep all metricbeats for example in the same index, but use the "beat.name" to make it easier to filter. For example I could have say some Delivery Controllers with a beat name of "DCs" and some file servers with beat name "FileServer" and that way I can say "show me the CPU usage for all DCs" and not get every server in the index's results. Does that make sense? Again its down to preference!
    2 and 3. I don't do that much with filebeat specifically at the moment other than for web server which I collect into an NGINX log. Again I use the beat name field here to make it easier to filter rather than spliting the index up. But again whatever is best for you. If you use X-Pack subscription and you use user security, you might want to consider when best to split, because you may want some users to only see the log files for webserver 1 and 2, but not webserver 3 and 4.