I had a strange problem the other day and could not figure out what was happening.
In my cluster, about 30% of my data nodes were brought down (intentionally). The cluster became red as expected but still had a quorum so the cluster was still up and running.
Now, I have a set of many Logstash instances that are identical in terms of configuration and write to the same index alias A.
The (ilm-managed) index A behind the index alias was Green when the cluster was red, so I expected all my Logstash instances to be able to write to it but what strange was happening was that about two-thirds of those Logstash instances were not writing to the index at all and the rest were writing just fine. It would have made more sense if all of them were not writing or all of them were writing.
I tested manually writing some test docs via HTTP to index A and was able to write fine.
The cluster was not overloaded in any way while it was red (I checked the write and search thread queues - all looked fine) so it was not that bulk requests from Logstash were being rejected.
There was nothing apparent in the Logstash logs, only messages about the nodes that were intentionally brought down but that is expected. My Logstash instances "hosts" setting is configured to all non-master nodes in the cluster.
Any idea what am I missing here guys
Many thanks for your help!