Filebeat traffic overwhelms ELK stack?


I currently have a 3-node elk stack that collect logs from ~90 microsites (30 different applications each having 2 rails + 1 sidekiq). Now I don't really know if 3 nodes are enough but so far this used to work fine with logstash-forwarder for each of the microsites so I think I have enough resources for my setup since some of the microsites are not that heavy...

My problem comes after I upgraded from ELK 2.x to ELK 5.1.2 + Filebeat 5.1.2.
I'm not sure why but it seems like logstash (or ES?) can't easily handle all of the traffic coming from Filebeat and I'd get error messages like this from Logstash:

retrying failed action with response code: 429 ({"type"=> "es_rejected_execution_exception", "reason"=>"rejected execution of org.elasticsearch.transport.TransportService$6@4b94c40b on EsThreadPoolExecutor[bulk, queue capacity = 50, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@65358f05[Running, pool size = 8, active threads = 8, ueued tasks = 50, completed tasks = 412657 ] ]"} )

I think it's related to ES being overwhelmed by the traffic, I'm not entirely sure....I couldnt find anything in discuss about this but if there is a topic open already, I'd be more than happy to join it.

Just to be clear, my ELK stack eventually recovers but I'm wondering if there is a better way/setup in order to prevent this.

On a side note, would it help if I try to get rid of logstash and use ES entirely for my logs by utilizing ingest? I tried to look into this but I'm not sure how to use GROK for the syslogs. I currently add a document_type to my logs (syslog, sidekiq and rails) and filter them in logstash. Anything that is a syslog uses grok preprocessor, anything that is not, gets parsed as JSON.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.