Hi
We're facing some performance issue
cluster build on 9 nodes
3x ingest
3x master
3x data (on NVMe disks)
index_with_data 2 r STARTED 10590253 elk_es_data-1
index_with_data 2 p STARTED 10590253 elk_es_data-2
index_with_data 1 p STARTED 10470947 elk_es_data-0
index_with_data 1 r STARTED 10470947 elk_es_data-2
index_with_data 0 p STARTED 10483501 elk_es_data-1
index_with_data 0 r STARTED 10483501 elk_es_data-0
we have 3 primary shard with 1 replica on these data nodes
which one parameter are handling for high limit of response
when I'm testing with 400 rps it goes smoothly but 500 rps are forcing "
The HTTP 429 Too Many Requests response status code indicates the user has sent too many requests in a given amount of time
"
index_with_data 2 r STARTED 10590253 9.5gb 10.124.226.245 elk_es_data-1
index_with_data 2 p STARTED 10590253 9.4gb 10.124.227.207 elk_es_data-2
index_with_data 1 p STARTED 10470947 9.6gb 10.124.229.38 elk_es_data-0
index_with_data 1 r STARTED 10470947 9.5gb 10.124.227.207 elk_es_data-2
index_with_data 0 p STARTED 10483501 9.5gb 10.124.226.245 elk_es_data-1
index_with_data 0 r STARTED 10483501 9.8gb 10.124.229.38 elk_es_data-0
What is the specification of the nodes with respect to RAM, heap size and CPU?
Elasticsearch will use the heap assigned, but having more heap assigned than necessary will not necessarily improve performance. In addition to the heap Elasticsearch also stores some data off-heap. The rest of the memory is used by the operating system page cache to cache frequently accessed files and as I pointed out in the linked thread it is important to make sure your full data set fits in the page cache for high query concurrency use cases in order to avoid disk I/O to the greatest extent possible. The JVM heap graphs indicate that you can reduce the heap size, so that is what I would do. Try lowering it from 21GB to e.g. 14GB and see if that makes any difference. Also make sure that Elasticsearch has access to all the memory it has configured. You do not want to overprovision and risk having parts of the memory swapped out to disk.
The minimum query latency you can achieve will depend on the shard size, the data and the queries run. If possible I would recommend trying to reduce the primary shard count to 1 so a single shard can serve a query all by itself.
You seem to have a very low level of CPU configured. If you are able to support 400 concurrent queries with this your queries must be very simple. Elasticsearch sizes threadpools based on how many CPU cores are available so I would recommend increasing this and allocate a full 8 CPU cores to each data node. I would bump the CPU allocation of the master nodes as well. You do not want them to be starved of CPU when they actually need it. As with memory, make sure Elasticsearch has access to all the CPU resources that are configured and do not overprovision.
I was misled with these limits.
Each data node is allocated a full 8 cores of which, in practice, 2 cores are taken to support the OS.
During 500rps requests, as you can see, these resources are saturated almost evenly. But Master nodes consume very little CPU. So at least you suggest making 1 primary shard with 2 replicas on 3 nodes?
Master nodes are not involved in request processing so queries should be directed directly at data nodes. If having a single primary shard for the index still allows you to meet your latency requirements, I would recommend changing to that as it will allow each query to be served by a single shard, which reduces the number of threads involved in serving each query.
You may also want to try sending your queries with a _local preference.
It is also worth looking into how you are benchmarking the cluster. Are you using realistic queries with a realistic distribution? Are you using persistent connections to Elasticsearch? How many concurrent connections/threads are you using?
This cluster is testing through gatling test scenario. We have dedicated query to this database, so we 're trying to reach high limit but 400 is not the optimise results
I have not used Gatling. Given the total number of connections reported opened it does not look like it is using persistent connections, which makes a big difference. If you would be using any of the official language clients to interact with Elasticsearch they would use persistent connections, so your benchmark may not be representative.
but we need to consider the test environment very similar for the client mode (from the user perspective)
so that's why we're using such tool with configure some of search request
and we have very similar output as You have in Rally
I do not understand. Which client library will you use to communicate with Elasticsearch once it is in production? Would this connect to Elasticsearch the same way Gatling does?
I am not arguing about output or results but rather the details of how the benchmark is performed and whether this is realistic or not.
gatling test use http to connect directly to elasticsearch data service
Yes it is as realistic as possible that is why it was used here. On the other hand, we notice as if there was some kind of service blockade from 500 rps upwards
All official Elasticsearch clients use connection pools with long-running persistent connections as establishing new connections add a significant amount of overhead, especially when TLS is used. Gatling uses HTTP, but if it creates a new connection per request it will add more load to Elasticsearch that the standard clients would, and if this is the case the benchmark may not be realistic.
Another interesting test to run is to spin up two gatling nodes and run these in parallel against Elasticsearch to ensure Elasticsearch and not Gatling is the bottleneck.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.