I would like to setup highly available logstash with highly available Redis queue. My idea is to have Logstash shipping instances (configured on app servers) send logs to Redis and from their, Logstash indexers work on the data as mentioned in this section.
I am planning to have below architecture.
Multiple shipping instances --> 2 Redis servers --> 2 Logstash instaces --> Elasticsearch
To make Redis highly available , I am planning to have 2 Redis servers in case of fail over situation. The filebeats Redis output plugin is really good, and can setup loadbalancing of Redis servers as well as given here.
On the indexer side, I am planning 2 Logstash indexer instances to make it highly available and work even if one of them fails. Here, Redis input plugin for Logstash doesn't come with Load balancing, but we can setup multiple Redis inputs. With this I have following questions:
Since both logstash indexer instances have 2 Redis inputs configured, is one event from Redis sent to both Logstash indexer instances? So, each event is deplicated here right?
Adding up to previous question, how does Redis know that one logstash instance has already read an event and it should not send it to other one? I was thinking that, once a Logstash instance reads an event from Redis, it is popped out using BLPOP and there is no way for other logstash instance to read the same event. But, this section Managing Throughput Spikes with Message Queueing, tells me otherwise.
Adding a message broker to your Logstash deployment also provides a level of protection from data loss. When a Logstash instance that has consumed data from the message broker fails, the data can be replayed from the message broker to an active Logstash instance.
How can Redis replay the message if the message is read by logstash instance and hence is popped out?
3 . This one is regarding the shipping instances. If I install filebeats on App servers and configure prospectors as given here, say I have an issue with Filebeats reading from the directory setup in the configuration, how can I still achieve high availability? How can I configure other shipping instances to read from same directory which is on different server as was explained in final diagram here? Does this case happen or not since shipping instances are light-weight it does not?