Sorry for the rambling topic, but I am falling down a rabbit hole. Once I feel like a get a handle on ELK terminology and infrastructure the floor drops.
I have never completely understood why you would ship a beat directly to ES? Is it not responsible for the storage and indexing of data for querying? I thought for parsing of the logs events you had to use logstash. Does that mean that parsing can take place at beats level now before it is shipped out and in some cases logstash is not even needed?
And pipelines? What are these and how are they different than shipping via the beat to logstash? Why would I use one over the other? I did try to pull this information from the documentation. To my knowledge I have never setup a pipeline and have tons of data shipping. Am I missing out on something?
Finally modules. I am not totally clear on what they accomplish. I thought they are predefined parsing methods for known products, but if that is the case then why are they tied to beats and not tied to logstash which I thought did all of the parsing? Andy why are they tied to pipelines?
Is there any documentation that gives a broad overview of this? I can only fine details for each in the documentation but not why or how they all act together.
I'm going to start with an example. Say you use filebeat, enable the apache module and output to elastic and had proper authority to create templates, ingest pipelines and ILM policies. The first filebeat (for each version, like 6.7.1. 6.8.2 etc) will setup templates and ingest pipelines for the apache module. (Or use the filebeat setup commands)
There is a tool that will convert an elastic ingest pipeline to a logstash pipeline. It may not convert all parts of all pipelines, but I picked apache because I think it converts most or all of it. The output is the filter section of a logstash pipeline. Add input and output to compete it. Now you could change the filebeat above to output to this logstash pipeine and the results should be equivalent.
We have more experience in logstash and I find it easier to test and develop. We have systems with some logs that don't have filebeat modules, so logstash gives us a common frontend.
Recently, we encountered an ingest pipeline that refused to convert to logstash. I added the module ingest pipeline to elastic, added the pipeline option to the logstash output and let elastic parse the logs for that module.
There are pros/cons to either method. Beats is pretty good at HA, if it can't send logs, it will when it can (up to ignore_older) and it will round-robin / fail over over a list of elastic targets. However, logstash still has a role for things like syslog protocol and persistent queues.
Thanks for the reply but I guess I am still not clear on the pros and cons of a pipeline vs straight output from filebeat? Regardless of the destination.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.