Logstash config file and Redis

Previously, we have been running 6 different logstash processes on each Logstash server (5 shippers and 1 indexer) and 1 Redis process. The shippers each had their own configuration file, listening for data incoming on specific ports, and the output section was configured to send to one of two Redis servers (for load balancing and failover protection). The indexer process then had an input of Redis and output to Elasticsearch.

With ES 2.x, we decided to standardize and use a single logstash.conf file and therefore a single logstash process + 1 Redis process. Which should make things easier to manage, handle, and monitor. But I can't find any examples of using Redis in the middle like this.

But how do I specify multiple options for the output section. Specifically, what values can I key off of to specify that the shippers (syslog, snmptraps, beats, logs, etc) all ship their data to Redis first. Then only redis output gets sent to Elasticsearch.

You should use two instances, one for incoming events into redis, one for outgoing into ES.

But what you want is conditionals - https://www.elastic.co/guide/en/logstash/current/event-dependent-configuration.html#conditionals

So just to confirm. The standard process, if you are using Redis, would be to have a shipper config that has the inputs for all incoming streams, appropriate filters, and an output to Redis. Along with a separate indexer config that has an input of Redis with an output of Elasticsearch.


No, the other way.
Input and then output to redis, and then input, filters and output to ES.

So, the shipper config shouldn't have any filters and the data should just be passed to Redis. Then the indexer config takes care of the filters and sending the data to Elasticsearch?

Wouldn't that put events that should be dropped through Redis unnecessarily? What are the benefits of putting the filters in the indexer config?

Yep :slight_smile:

You get the events into your broker layer ASAP. If you have filters on the incoming part you may end up slowing things upstream down. Where as putting it after means they just queue if your existing instances are busy, and it allows you to easily scale (it's harder to scale inbound).