Hi,
I currently have a 3-node elk stack that collect logs from ~90 microsites (30 different applications each having 2 rails + 1 sidekiq). Now I don't really know if 3 nodes are enough but so far this used to work fine with logstash-forwarder for each of the microsites so I think I have enough resources for my setup since some of the microsites are not that heavy...
My problem comes after I upgraded from ELK 2.x
to ELK 5.1.2
+ Filebeat 5.1.2
.
I'm not sure why but it seems like logstash (or ES?) can't easily handle all of the traffic coming from Filebeat and I'd get error messages like this from Logstash:
retrying failed action with response code: 429 ({"type"=> "es_rejected_execution_exception", "reason"=>"rejected execution of org.elasticsearch.transport.TransportService$6@4b94c40b on EsThreadPoolExecutor[bulk, queue capacity = 50, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@65358f05[Running, pool size = 8, active threads = 8, ueued tasks = 50, completed tasks = 412657 ] ]"} )
I think it's related to ES being overwhelmed by the traffic, I'm not entirely sure....I couldnt find anything in discuss about this but if there is a topic open already, I'd be more than happy to join it.
Just to be clear, my ELK stack eventually recovers but I'm wondering if there is a better way/setup in order to prevent this.
On a side note, would it help if I try to get rid of logstash and use ES entirely for my logs by utilizing ingest? I tried to look into this but I'm not sure how to use GROK for the syslogs. I currently add a document_type to my logs (syslog, sidekiq and rails) and filter them in logstash. Anything that is a syslog uses grok preprocessor, anything that is not, gets parsed as JSON.