Balancing logs flowing from multiple services

We have ELK deployment in which multiple logstash-forwarders (from multiple service logs) pushes log to logstash which then sends it to Kafka and then one more logstash pulls those logs from Kafka and indexes them in ElasticSearch cluster.

Can some one let me know:-

  1. What is the best way people handle traffic spikes in such deployment?
  2. Can we configure Kafka in such a way that a single logstash-forwarder (of a single service) does not hog the entire set-up? We want to ensure that all services can use the set-up fairly?
  3. How to ensure that one index does not grow huge compared to others? We have many daily indexes of several service logs and we want to ensure that during spike if one service index (or shards) become too large we block that service from sending logs, such that the cluster performance does not get effected?
  1. Well, you already use Kafka as a buffer so spikes shouldn't be a problem on the consumer end (but you might build up a backlog that takes some time to process). On the producer end you should be fine as long as log files aren't rotated out of existence too fast for LSF to pick up their contents and ship to Kafka.
  2. You'll probably have more luck asking that question to Kafka folks.
  3. There's nothing built-in for this but you could obviously monitor the index size continuously and take some kind of action once it passes a threshold. Once over the threshold Logstash could start dropping messages or divert them to a flat file or something. It's not entirely obvious how to signal to Logstash that the threshold has been met. Maybe the translate filter could help.

For 3 - We were also thinking on that line. But is there anything in logstash that allow us to dynamically change the logstash configs without restarting the logstash. Basically what I am trying to achieve is once a particular index becomes large then we want logstash to use a different config that will allow logstash to send logs to some file or Is it possible to achieve the same on the forwarder end?

You can't change the Logstash config without restarting, but did you look at the translate filter?

Yeah I looked into the translate filter. This just seems to translate an event . I am not able to get how it will help us in our scenario.

Are you saying something like when an index grows big then save that information somewhere and before writing to elasticsearch check the status and if it signals that index is huge then don't write to elasticsearch , write to some where else. This I think can be achieved even without the translate filter.

Are you saying something like when an index grows big then save that information somewhere and before writing to elasticsearch check the status and if it signals that index is huge then don't write to elasticsearch , write to some where else.

Exactly. The translate filter would implement a lookup table that matches some key (like the originating server or application) and returns whether that application has reached its quota.

This I think can be achieved even without the translate filter.

Yeah, I'm sure there are other ways (e.g. the ruby filter comes to mind). One advantage of the translate filter is that it should be very very quick (useful during peaks in the load).

When the translate filter is configured to use an external file, this can periodically be reloaded. You could therefore as magnus suggested have a mapping per application or logical flow which indicates to what extent events should be dropped, which can be modified periodically without restarting Logstash.

Thanks to both of you.

Can we configure Kafka in such a way that a single logstash-forwarder
(of a single service) does not hog the entire set-up? We want to ensure
that all services can use the set-up fairly?

As discussed in Kafka User Group this can be achieved using quotas which is added in Kafka 0.9.0/