Design and implementation dimensions on pipelines and indices


(Acropia) #1

I'm fairly new to Elastic Stack and try to setup a lab environment for Proof-of-Concept. I've watched almost every Getting Start video on de Elastic website and read documentation thorougly on installing my setup. So I know the components and config used.

I've managed to have a working setup with Filebeat (system and apache2 modules), Logstash, Elasticsearch and Kibana. Each of the components is working, and I can discover the harvested logs in Kibana.

But, there is one thing I can't understand even after testing and reading documentation for hours now. And that is the dimensions in which Logstash Pipelines and Elasticsearch Indices relate to eachother. I have a working log flow, but I don't understand the dimension to scale in when it come to the complexity / scope of logging.

Situation: I've enabled the Filebeat modules system and apache2. I've got one beats.conf with input type beats, and output to elasticsearch. When I do no grok filtering, my apache log messages are not split. But when I configure grok filtering, the system logs are tagged with grokparsefailure. I understand the nothing vs. all situation here. But I can't get my head right on choosing the right scaling directions from this point on. And I can't find documentation on this matter.

Questions I have:

  1. With multiple enabled filebeat modules, do I need multiple Logstash pipelines (on different ports)?
  2. With multiple enabled filebeat modules, do I need multiple Elasticsearch Indices to store the different filebeat modules seperatly?
  3. Can anybody explain this in a logical manner like each beat module requires a custom Logstash pipeline? Or each pipeline requires a custom Elasticsearch Index?

Or can anybody point out a piece of documentation on these dimenions in designing pipelines and indices?

Thanks in advance!


(Lewis Barclay) #2

Welcome!

  1. You can do that, or you can add a tag based on the source or beatname, then from there do conditional filtering based on the tag. For example if you add a "beat name" to the config on your heats then you could do the following:

     filter {
             if ([beat][name] == "your-beat-name-1") {
                     grok {
                     }
             }
    
     if ([beat][name] == "your-beat-name-2") {
                     mutate {
                     }
             }
     }
    
  2. Again, it depends how you want to operate. You can index everything into one index or you can split them up. It is entirely up to you. I prefer to split them up personally and that's how I do it. Again you can add tags based on the above example then in your output do conditional output to elastic based on the tags and specify an index.


(Acropia) #3

Thanks for your reply!

Can you explain your choices/decisions on splitting up indices? What is your reason to split?

  1. Do you split each Logstash Pipeline into a different index?
  2. Do you split each Filebeat Module into a different index?
  3. Do you split up based on corresponding sources (i.e. all Apache logs, whatever way the pass to Elasticsearch)?

(Lewis Barclay) #4

The reason I prefer to split is that I like having more customisation like the ability to retain some indexes longer than others. For example if you lump say filebeat, metricbeat and heartbeat all into a single "beat" index, then you can only keep that index for one specified period of time, i.e 30 days.

If you split them into heartbeat-, filebeat- and metricbeat-* then I could set heartbeat to be kept for 7 days, filebeat for 60 days and metricbeat for 30 days for example.

  1. Again it depends, I keep all metricbeats for example in the same index, but use the "beat.name" to make it easier to filter. For example I could have say some Delivery Controllers with a beat name of "DCs" and some file servers with beat name "FileServer" and that way I can say "show me the CPU usage for all DCs" and not get every server in the index's results. Does that make sense? Again its down to preference!
    2 and 3. I don't do that much with filebeat specifically at the moment other than for web server which I collect into an NGINX log. Again I use the beat name field here to make it easier to filter rather than spliting the index up. But again whatever is best for you. If you use X-Pack subscription and you use user security, you might want to consider when best to split, because you may want some users to only see the log files for webserver 1 and 2, but not webserver 3 and 4.