How to Improve ES responsiveness when getting warning: retrying failed action with response code: 429

My flow is Logs files will be processed from FileBeat > Logstash > Elasticsearch.

I m getting the below 2 errors many times in Logstash logs.

Error 1:

[INFO ][logstash.outputs.elasticsearch][inventory] retrying failed action with response code: 429 ({"type"=>"es_rejected_execution_exception", "reason"=>"rejected execution of processing of [213657339][indices:data/write/bulk[s][p]]: request: BulkShardRequest [inventory][0]] containing [600] requests, target allocation id: AGnVTdHZSo-QDyap7mq4qg, primary term: 7 on EsThreadPoolExecutor[name = node-2/write, queue capacity = 200, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@2f18ea8e[Running, pool size = 8, active threads = 8, queued tasks = 2539, completed tasks = 9050422]]"})

Error 2:

[INFO][logstash.outputs.elasticsearch][rtv] retrying failed action with response code: 429 ({"type"=>"circuit_breaking_exception", "reason"=>"[parent] Data too large, data for [<transport_request>] would be [8267602870/7.6gb], which is larger than the limit of [8094194073/7.5gb], real usage: [8266697320/7.6gb], new bytes reserved: [905550/884.3kb], usages [request=0/0b, fielddata=13447/13.1kb, in_flight_requests=3286973706/3gb, accounting=53142083/50.6mb]", "bytes_wanted"=>8267602870, "bytes_limit"=>8094194073, "durability"=>"TRANSIENT"})

Elastic configuration:
Version: 7.6.0
Nodes: 3
Disk Available: 32.79% || 96.2 GB / 293.4 GB
JVM Heap: 51.83% || 12.3 GB / 23.8 GB
Indices: 70
Documents: 186,098,201
Disk Usage: 141.4 GB
Primary Shards: 78
Replica Shards: 78

Please help how I can improve ELK performance.

Firstly, upgrade. 7.6 is EOL and no longer supported.

Second, a 429 means you are overloading Elasticsearch, so you might need more CPU to start.

Thanks for your reply surely we will upgrade ELK.

But do we really need more CPU? As per the CPU usage chart, it spikes to 90%-95% for a few seconds, every time filebeat read the data and then back to 1%-10%. I don't understand what should we improve now?

We have 30 pipelines running and each configured with different log paths. For each pipeline, a new log file will be pushed every 2 hours. Approx content each file will have is 10000 logs.

Filebeat config:

  scan_frequency: 30m
  ignore_older: 73h
  close_inactive: 72h
  clean_inactive: 74h

Logstash/filebeat server: 3
elastic node: 3
Replica: 1

Please suggest how can I improve EL performance.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.