We're facing some performance issue
cluster build on 9 nodes
3x data (on NVMe disks)
index_with_data 2 r STARTED 10590253 elk_es_data-1
index_with_data 2 p STARTED 10590253 elk_es_data-2
index_with_data 1 p STARTED 10470947 elk_es_data-0
index_with_data 1 r STARTED 10470947 elk_es_data-2
index_with_data 0 p STARTED 10483501 elk_es_data-1
index_with_data 0 r STARTED 10483501 elk_es_data-0
we have 3 primary shard with 1 replica on these data nodes
which one parameter are handling for high limit of response
when I'm testing with 400 rps it goes smoothly but 500 rps are forcing "
The HTTP 429 Too Many Requests response status code indicates the user has sent too many requests in a given amount of time
index_with_data 2 r STARTED 10590253 9.5gb 10.124.226.245 elk_es_data-1
index_with_data 2 p STARTED 10590253 9.4gb 10.124.227.207 elk_es_data-2
index_with_data 1 p STARTED 10470947 9.6gb 10.124.229.38 elk_es_data-0
index_with_data 1 r STARTED 10470947 9.5gb 10.124.227.207 elk_es_data-2
index_with_data 0 p STARTED 10483501 9.5gb 10.124.226.245 elk_es_data-1
index_with_data 0 r STARTED 10483501 9.8gb 10.124.229.38 elk_es_data-0
What is the specification of the nodes with respect to RAM, heap size and CPU?
Elasticsearch will use the heap assigned, but having more heap assigned than necessary will not necessarily improve performance. In addition to the heap Elasticsearch also stores some data off-heap. The rest of the memory is used by the operating system page cache to cache frequently accessed files and as I pointed out in the linked thread it is important to make sure your full data set fits in the page cache for high query concurrency use cases in order to avoid disk I/O to the greatest extent possible. The JVM heap graphs indicate that you can reduce the heap size, so that is what I would do. Try lowering it from 21GB to e.g. 14GB and see if that makes any difference. Also make sure that Elasticsearch has access to all the memory it has configured. You do not want to overprovision and risk having parts of the memory swapped out to disk.
The minimum query latency you can achieve will depend on the shard size, the data and the queries run. If possible I would recommend trying to reduce the primary shard count to 1 so a single shard can serve a query all by itself.
You seem to have a very low level of CPU configured. If you are able to support 400 concurrent queries with this your queries must be very simple. Elasticsearch sizes threadpools based on how many CPU cores are available so I would recommend increasing this and allocate a full 8 CPU cores to each data node. I would bump the CPU allocation of the master nodes as well. You do not want them to be starved of CPU when they actually need it. As with memory, make sure Elasticsearch has access to all the CPU resources that are configured and do not overprovision.
I was misled with these limits.
Each data node is allocated a full 8 cores of which, in practice, 2 cores are taken to support the OS.
During 500rps requests, as you can see, these resources are saturated almost evenly. But Master nodes consume very little CPU. So at least you suggest making 1 primary shard with 2 replicas on 3 nodes?
Master nodes are not involved in request processing so queries should be directed directly at data nodes. If having a single primary shard for the index still allows you to meet your latency requirements, I would recommend changing to that as it will allow each query to be served by a single shard, which reduces the number of threads involved in serving each query.
It is also worth looking into how you are benchmarking the cluster. Are you using realistic queries with a realistic distribution? Are you using persistent connections to Elasticsearch? How many concurrent connections/threads are you using?
I have not used Gatling. Given the total number of connections reported opened it does not look like it is using persistent connections, which makes a big difference. If you would be using any of the official language clients to interact with Elasticsearch they would use persistent connections, so your benchmark may not be representative.
All official Elasticsearch clients use connection pools with long-running persistent connections as establishing new connections add a significant amount of overhead, especially when TLS is used. Gatling uses HTTP, but if it creates a new connection per request it will add more load to Elasticsearch that the standard clients would, and if this is the case the benchmark may not be realistic.
Another interesting test to run is to spin up two gatling nodes and run these in parallel against Elasticsearch to ensure Elasticsearch and not Gatling is the bottleneck.