I have just discovered the http-poller plugin and it seems like it could replace some custom scripts that we have for getting monitoring data (e.g. index growth) into Elasticsearch. But it is not clear to me if I can restrict the poller to only run on 1 node in a Logstash cluster (without having to make a special configuration for the chosen node - which goes against the whole idea of clustering).
Maybe it is not important to only do the polling from one node, maybe it is not important to avoid having copies of the polled data in the index? But if there was a way to avoid 3 nodes from polling almost the exact same data, I would be interested in knowing how.
Can you provide more context on what you are trying to do and what is your issue? It is not clear since you didn't share any configuration or any logs.
Logstash does not run as a cluster, each logstash instance is independent from each other, so it is not clear what you mean with Logstash cluster.
By cluster I mean that we have 3 identically configured Logstash-nodes accepting input from a large number of application-nodes, and sending data to a Elasticsearch cluster. The 3 Logstash nodes might not technically be a cluster, but we take great care never to stop all of them at the same time so that our applications will always be able to ship/beat their logs to the Elasticsearch cluster.
The 3 Logstash nodes are monitored by a product called CheckMK that runs a little shell/curl script that retrieves performance data about the pipeline with something like this:
And if the queue size grows above 50 / 90 % of the max queue size, CheckMK will raise an alarm.
My intention was to have Logstash do the HTTP polling and write the result to an index which Grafana (another monitoring tool) would look at and raise the alarm.
Now that I think about, I actually need all 3 to report their local queue size, but I imagine that there are other situations where I do not want to have 3 nodes all polling the same HTTP source and reporting almost identical results (almost because a small delay might produce a slightly different result from the HTTP source).
There is nothing in Logstash that would allow you to do that, as mentioned before logstash does not work as a cluster, every logstash instance is independent from each other, if you only want one instance polling and endpoint, then you need to have your polling configuration in only one instance.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.