we want to use multiple instances of the different Beats. (File, Metric, Winlog, Packet and Heart)
Metricbeat on multiple servers for monitoring, Filebeat on multiple servers for logs and so on.
They should collect the information, so it can be visualized with Kibana at the end.
To start i set up 3 Elasticsearch nodes on 3 servers and Logstash and Kibana on another one.
Now my first question is:
What is the best practice to connect all the components?
Should all Beats send everything to Logstash first, so it can be stashed to Elasticsearch from there, or should only the Beats where further processing of the data is required, like Filebeat, send the data to Logstash and for example Metricbeat outputs it directly to Elasticsearch? Is a mix of directly sending to Elasticsearch and sending to Logstash first a good idea?
My second question would be:
I saw in the metricbeat.yml that there is a Kibana configuration which is required if i want to use predefined dashboards. Does this setting bypass Logstash and Elasticsearch alltogether and just sends the data to be visualized to Kibana directly? Or does it still use the defined output? If using this is there a difference between outputting to Logstash and outputting to Elasticsearch? Is further processing with Logstash an option when using this?
My third question is:
What is the best practice for setting up the Logstash pipelines in such a setup? Should there be one pipeline per Beat-type, or one pipeline for all Beats, or even one pipeline per Beat instance?
Your best bet would be to look at using Fleet, while it doesn't contain all the Beats yet, it will and will be the easiest way to manage their deployment and config. To your questions though;
Depends, what sort of processing are you looking to do?
It does the setup of the dashboards to Kibana, that's it
One per data type usually. But if you can use an ingest pipeline you can remove Logstash
The main objective with all this is to learn more about the elastic stack and how the components work together. So using the Beats will be the first step in the elastic world and maybe we can generate some useful information along the way. But in the future we will have more specific use cases like generate a variety of reports form our digital office solution or use elastic for.. you know, for search... in our own application, which is a digital archive in the construction context. So we are not looking for easy to manage, but for full functionality and configurability.
It is quite certain that logstash will become a necessity in the future, but for now we want to see which data is generated within our infrastructure and how can we visualize it in a meaningful way. Especially we are looking for a way to see any abnormalities or deviations from normal behaviour, also over longer periods of time. So my assumption was that processing of the logs with logstash would be required to aggregate such information in the most customizable manner. Even if we do not need it now for this usecase, i very much assume that it will be required for the report generation from the database of the digital office solution.
So the question remains, if it is a good idea to use logstash where we really need it like described above and not use it where we don't need it, like for simple system metrics. Or if the better aproach, also in terms of performance, would be to send all the input through logstash, regardless of a need to process so the Elasticsearch cluster has a more streamlined input and does not have to deal with multiple sources.
Can you elaborate on "one per data type"? For example which Beats share a common output data type and which don't?
Is "using an ingest pipeline" using the output.elasticsearch for a Beat, or is it a whole different story? When using this output option is it necessary to configure an ingest pipeline in Elasticsearch or is configuration on the Beat side enough?
If you need to do extra processing that Elasticsearch and an ingest pipeline can use, the use Logstash. Otherwise Elasticsearch will be fine.
All Beats use ECS, which helps normalise things. But Filebeat ships logs, Metricbeat ships metrics; Those are basically two logical types of data that you'd want to keep separate.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.