I need some help diagnosing my cluster performance.
Indexing is very slow and the whole cluster as well as Kibana is pretty unresponsive.
Here is my cluster:
We index about 1,5TB or 1.5 bn events every day.
We have multiple indices indexing at the same time with the most load originating from the logstash-* index. (38 Shards, no replicas)
Our cluster is split into two tiers.
- T1: SSD, high CPU, high RAM
- T2: HDD, medium CPU, high RAM
Here are the index settings for the logstash-* index:
This is the config of a typical T1 Data node:
A screenshot of the Monitoring page for the node:
A screenshot of the Monitoring page for the logstash-* index:
Here are a few minutes of logs from the data node:
Here you can see the bulk indexing queue of several nodes:
Thanks for reading this far.
Please hit me up if you need any more information.