Hello everyone,
I just a have a question about error handling in Logstash.
For example I have a cluster like;
kafka -> logstash -> Elasticsearch
and this logstash is also connected to a HDFS.
so either
kafka -> logstash -> Elasticsearch
or
kafka -> logstash -> HDFS
(by the way i am consuming from two different topics)
my question is here; how can i stop consuming from the topic of HDFS in Kafka if HDFS is not reachable and same for topic of elasticsearch?
I am sure that if I use Spark Streaming instead of logstash there is a system of error handling which will enable me to do so but I dont know at all how it can be done via Logstash. Can you please give me some information about that ?
Thank you all
my question is here; how can i stop consuming from the topic of HDFS in Kafka if HDFS is not reachable and same for topic of elasticsearch?
That's what Logstash does. If one or more outputs are blocked the whole Logstash pipeline stalls and stops reading from the inputs.
Check out the pipeline:
https://www.elastic.co/guide/en/logstash/current/pipeline.html
Events are created by input threads and passed through a zero length queue
to a worker that performs filter/output so you have two input threads and
ES and HDFS output worker threads. Logstash relies on "backpressure" from
outputs to handle problems where an output isn't working. If one of the
systems isn't responding you eventually will have blocked output threads
from that system and the pipeline would build back pressure (ceases
processing) until the blocking stops.
So how do you stop consumption for either? It just happens if the system is
unavailable. Keep in mind, if you had multiple outputs with multiple
streams, if one system goes down on the outputs, it will likely stop the
processing of all of your events.