When ingest nodes are made available, why would I choose to use an ingest node to process my data as opposed to my already existing logstash pipeline?? Is there a performance increase by doing processing operations at the elasticsearch level?? We are already using dedicated client nodes for ingest purposes to send data to the cluster. Would an ingest node be better than a dedicated client node?? I'd just like to make certain I am weighing the benefits of this new node type properly before I start dedicating effort to implementing it
Ingest might simplify your architecture in some simple cases like "I just want to tail a file".
In such a case, using filebeat + ingest would be fairly easy to setup.
I'd keep using logstash for more advanced needs like reading from an input which is not supported by beats (twitter is one of them, jdbc...) and/or if you want to output to a datastore like archiving to S3 or HDFS which ingest will never do.
Sure that makes sense. So there is no benefit/incentive to move filter processing (grok, mutate, other specific filters) OUT of logstash and INTO an ingest node then, is that correct?? It sounds like ingest nodes are more intended as a quick drop-in solution for people who already have a working ES cluster and don't want to have to dedicate effort to standing up logstash.
If you are just doing grok/mutate things, it is definitely worth the cost of moving to ingest for the reasons I exposed. ATM they might be faster than inside LS (not sure about this though so test it first).
But you have to do that on nodes which don't have crazy pressure already or dedicate nodes.
Also, it's something new. So definitely you need to test with your use case to make sure it works as you could expect with your use case.
Are you using beats already? Or collecting logs with logstash?
Yes we are already using beats but using its logstash output, where we do further processing of the data. We also use logstash to collect data via the logstash-kafka input option.
Why would performing grok/mutate's be faster in an ingest node as opposed to doing that processing inside of logstash?? Is that purely because it involves sending less data over the network?? Also, does ingest have access to more advanced operations that logstash has filters for, like DNS, geoip, de_dot, json, ruby, etc...
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.