Im running into some problems with elastic ingest.
We're running 5 data nodes on kubernetes (hosts have 56 cores / 64Gb RAM). Data is coming as beats -> kafka -> logstash (also via k8) -> elastic
Ingest rate seems to be topping out at 200-300 e/s so we're lagging considerably behind our data feed.
Seeing errors like this pretty frequently:
[logstash.outputs.elasticsearch] retrying failed action with response code: 429 ({"type"=>"es_rejected_execution_exception", "reason"=>"rejected execution of processing of [7142251][indices:data/write/bulk[s][p]]: request: BulkShardRequest [[filebeat-2019.08.05][0]] containing [124] requests, target allocation id: aSctrLvyTpSZkzPqVIvE0A, primary term: 2 on EsThreadPoolExecutor[name = elasticsearch-data-1/write, queue capacity = 200, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@8085181[Running, pool size = 1, active threads = 1, queued tasks = 200, completed tasks = 5601415]]"})
Looking at the above index ("filebeat-2019.08.05") it's 30Gb (with >40M docs) (and growing).
Questions:
- Should I start rolling this index over every hour?
- My cluster has ~2600 primary shards and ~800 indexes. Too many?
- I've split up my busier inputs into separate pipelines. On several of these Im seeing "output" values like "1.92k ms/e". Is this latency due to something I have (mis)configured?
Cluster is all 7.3.0 - 5 data nodes, 5 ingest and 3 master.