Hi Team,
My flow is, JSON Logs files will be processed from FileBeat > Logstash > Elasticsearch. But it skips some data.
In thread_pool It shows below many rejected write requests.
node-1 write 0 0 0
node-2 write 0 0 151424
node-3 write 0 0 573
I can see below 2 errors many times in recent Logstash logs.
Error 1:
[INFO ][logstash.outputs.elasticsearch][inventory] retrying failed action with response code: 429 ({"type"=>"es_rejected_execution_exception", "reason"=>"rejected execution of processing of [213657339][indices:data/write/bulk[s][p]]: request: BulkShardRequest [inventory][0]] containing [600] requests, target allocation id: AGnVTdHZSo-QDyap7mq4qg, primary term: 7 on EsThreadPoolExecutor[name = node-2/write, queue capacity = 200, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@2f18ea8e[Running, pool size = 8, active threads = 8, queued tasks = 2539, completed tasks = 9050422]]"})
Error 2:
[INFO][logstash.outputs.elasticsearch][rtv] retrying failed action with response code: 429 ({"type"=>"circuit_breaking_exception", "reason"=>"[parent] Data too large, data for [<transport_request>] would be [8267602870/7.6gb], which is larger than the limit of [8094194073/7.5gb], real usage: [8266697320/7.6gb], new bytes reserved: [905550/884.3kb], usages [request=0/0b, fielddata=13447/13.1kb, in_flight_requests=3286973706/3gb, accounting=53142083/50.6mb]", "bytes_wanted"=>8267602870, "bytes_limit"=>8094194073, "durability"=>"TRANSIENT"})
Filebat YML:
scan_frequency: 30m
ignore_older: 73h
close_inactive: 72h
clean_inactive: 74h
Logstash pipeline.yml
- pipeline.id: order
path.config: "order.conf"
queue.type: persisted
pipeline.workers: 10
pipeline.batch.size: 1000
pipeline.batch.delay: 5
Elastic configuration:
Version: 7.6.0 (we will upgrade in near future)
Nodes: 3
Disk Available: 32.79% || 96.2 GB / 293.4 GB
JVM Heap: 51.83% || 12.3 GB / 23.8 GB
Indices: 70
Documents: 186,098,201
Disk Usage: 141.4 GB
Primary Shards: 78
Replica Shards: 78
Logstash/filebeat server: 3
As per the CPU usage chart, it spikes to 90%-95% for a few seconds, every time filebeat read the data and then back to 1%-10%.
We have 30 pipelines running and each configured with different log paths. For each pipeline, a new log file will be pushed every 2 hours. Approx content each file will have is 10000 logs.
Please suggest how can I improve ELK performance.