Highly available logstash with Redis


(Krishna Chaitanya) #1

I would like to setup highly available logstash with highly available Redis queue. My idea is to have Logstash shipping instances (configured on app servers) send logs to Redis and from their, Logstash indexers work on the data as mentioned in this section.

I am planning to have below architecture.

Multiple shipping instances --> 2 Redis servers --> 2 Logstash instaces --> Elasticsearch

To make Redis highly available , I am planning to have 2 Redis servers in case of fail over situation. The filebeats Redis output plugin is really good, and can setup loadbalancing of Redis servers as well as given here.

On the indexer side, I am planning 2 Logstash indexer instances to make it highly available and work even if one of them fails. Here, Redis input plugin for Logstash doesn't come with Load balancing, but we can setup multiple Redis inputs. With this I have following questions:

  1. Since both logstash indexer instances have 2 Redis inputs configured, is one event from Redis sent to both Logstash indexer instances? So, each event is deplicated here right?

  2. Adding up to previous question, how does Redis know that one logstash instance has already read an event and it should not send it to other one? I was thinking that, once a Logstash instance reads an event from Redis, it is popped out using BLPOP and there is no way for other logstash instance to read the same event. But, this section Managing Throughput Spikes with Message Queueing, tells me otherwise.

Adding a message broker to your Logstash deployment also provides a level of protection from data loss. When a Logstash instance that has consumed data from the message broker fails, the data can be replayed from the message broker to an active Logstash instance.

How can Redis replay the message if the message is read by logstash instance and hence is popped out?

3 . This one is regarding the shipping instances. If I install filebeats on App servers and configure prospectors as given here, say I have an issue with Filebeats reading from the directory setup in the configuration, how can I still achieve high availability? How can I configure other shipping instances to read from same directory which is on different server as was explained in final diagram here? Does this case happen or not since shipping instances are light-weight it does not?


(Magnus Bäck) #2

Since both logstash indexer instances have 2 Redis inputs configured, is one event from Redis sent to both Logstash indexer instances? So, each event is deplicated here right?

That depends on the setup. If you use a Redis list then each list item will reach exactly one Logstash instance, but I'd expect a pub/sub setup to lead to duplication since each subscriber gets the published messages. I believe there are way to avoid this with Redis but it's not clear that Logstash supports them.

This seems more like a Redis question than a Logstash question. I don't think there are that many Redis experts around here.

How can Redis replay the message if the message is read by logstash instance and hence is popped out?

The described behavior is broker-dependent. It'll definitely work with Kafka and can work with RabbitMQ, depending on the configuration.

3 . This one is regarding the shipping instances. If I install filebeats on App servers and configure prospectors as given here, say I have an issue with Filebeats reading from the directory setup in the configuration, how can I still achieve high availability? How can I configure other shipping instances to read from same directory which is on different server as was explained in final diagram here? Does this case happen or not since shipping instances are light-weight it does not?

Instances of Filebeat are independent and don't talk to each other. Pointing two instances to the same log files will lead to duplication.


(Krishna Chaitanya) #3

Thanks for the response Magnus.

So, if I have multiple app servers writing logs to their respective /var/log/app/*.json directories, I should have 1 shipper instance on each of these app servers and their configuration looks like this:

filebeat.prospectors:
- input_type: log
  paths:
    - /var/log/app/*.json

Is this correct?
Is there any way I can make this highly available (1 shipping instance failed)?
Or is this already highly available?


(Magnus Bäck) #4

Is this correct?

I don't know the syntax details by heart, but it looks reasonable.

Is there any way I can make this highly available (1 shipping instance failed)?

Shippers are sufficiently simple that they shouldn't fail, and if they do it's quite likely that the condition would affect failover shippers too. That is, I don't think high availability for shippers is very interesting. I suggest you instead look into alerting when messages stop arriving, e.g. using Lovebeat.

Or is this already highly available?

No.


(system) #5

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.