I want to capture and process failures related to Elasticsearch being down and Elasticsearch output does not offer a way to handle this so, I want to do a POC and experiment with indexing events into Elasticsearch using the http filter instead of the Elasticsearch output so that I could use the "tags" field to know if a failure happened.
Assuming the POC will work functionally as expected using http filter, can I reach similar performance achieved by using Elasticsearch output (connection pooling, bulk requests, compression, etc.)?
logstash has an at-least-once delivery model. If elasticsearch is down then the regular queues will back up. If they fill up then logstash will stop processing events and reading the inputs.
So using the http filter won't overcome the backing up issue? because the tags field in the event will indicate a failure
Or using http filter won't deliver good performance as the elasticsearch output and maybe I am misusing it?
What I am thinking of is after I find out a failure happened after the http filter, I would save this event somewhere else and a Cron job will do a health check and pick those up and try to index them again after elasticsearch is up again.
An elasticsearch sends a batch of events (perhaps 125) to elasticsearch in a single _bulk API call. An http filter makes an API call for each event. The performance will be far worse.
If elasticsearch is down then events back up behind the output. The filter will get an error when one of the timeouts fires (all the timeouts are ten seconds by default). So you can detect it being down using the filter, but it is going to be slow.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.