Problem Statement : Decrease in cluster throughput as we increase the number of indices in the cluster
Cluster Set up
Nodes = 8
Cores per node = 18
Memory = 90 GB
Heap = 28 GB
Version = 8.9.0
We are seeing decrease in throughput of the cluster as we increase the number of indices in the cluster. To verify that, we are creating dummy indices in the cluster which aren't having any documents inside them, but as the count of those indices is increasing throughput of the cluster is decreasing.
Earlier, I though Authorisation might be causing this. I checked the hot threads, don't see anything related to RBACEngine or anything related to Auth.
Perf Results
- Indices = 33, Wps = ~102k/s, CPU Usage = Between 72% to 82% across data nodes
- Indices = 2465, Wps = ~80k/s, CPU Usage = Between 60% to 75% across data nodes
I see approximate 20% decrease in throughput as we increased the indices. This will further drop down as we increase the number of indices
Our actual prod setup -
We create daily indices for all of our clients with different TTL(time to live) for each client. We roughly end up having 12k to 15k indices on our 300 node cluster setup.
We are worried as the number of indices will increase, our throughput will decrease further
Need help regarding -
- Any known issues around the same?
- Any pointers to help debug such issues?
- Any ideal maximum number of indices per node? (sum of unique indices count for all shards present on the node)
- Any ideal maximum number of indices per cluster?