Check performance of cluster

Hi
We're facing some performance issue
cluster build on 9 nodes
3x ingest
3x master
3x data (on NVMe disks)

index_with_data 2 r STARTED 10590253 elk_es_data-1
index_with_data 2 p STARTED 10590253 elk_es_data-2
index_with_data 1 p STARTED 10470947 elk_es_data-0
index_with_data 1 r STARTED 10470947 elk_es_data-2
index_with_data 0 p STARTED 10483501 elk_es_data-1
index_with_data 0 r STARTED 10483501 elk_es_data-0

we have 3 primary shard with 1 replica on these data nodes

which one parameter are handling for high limit of response

when I'm testing with 400 rps it goes smoothly but 500 rps are forcing "
The HTTP 429 Too Many Requests response status code indicates the user has sent too many requests in a given amount of time
"

and to much of thread_pool rejected

id                     name                                   active rejected completed
zenaffBuRLS0QL7_4RsXnw analyze                                     0        0         0
zenaffBuRLS0QL7_4RsXnw auto_complete                               0        0         0
zenaffBuRLS0QL7_4RsXnw ccr                                         0        0         0
zenaffBuRLS0QL7_4RsXnw fetch_shard_started                         0        0        31
zenaffBuRLS0QL7_4RsXnw fetch_shard_store                           0        0        69
zenaffBuRLS0QL7_4RsXnw flush                                       0        0      8713
zenaffBuRLS0QL7_4RsXnw force_merge                                 0        0         0
zenaffBuRLS0QL7_4RsXnw generic                                     0        0   4112775
zenaffBuRLS0QL7_4RsXnw get                                         0        0         0
zenaffBuRLS0QL7_4RsXnw listener                                    0        0         0
zenaffBuRLS0QL7_4RsXnw management                                  1        0   9762084
zenaffBuRLS0QL7_4RsXnw ml_datafeed                                 0        0         0
zenaffBuRLS0QL7_4RsXnw ml_job_comms                                0        0         0
zenaffBuRLS0QL7_4RsXnw ml_utility                                  0        0   1756028
zenaffBuRLS0QL7_4RsXnw refresh                                     0        0  38771823
zenaffBuRLS0QL7_4RsXnw rollup_indexing                             0        0         0
zenaffBuRLS0QL7_4RsXnw search                                     13   240381   9825986
zenaffBuRLS0QL7_4RsXnw search_coordination                         0        0      2341
zenaffBuRLS0QL7_4RsXnw search_throttled                            0        0         0
zenaffBuRLS0QL7_4RsXnw searchable_snapshots_cache_fetch_async      0        0         0
zenaffBuRLS0QL7_4RsXnw searchable_snapshots_cache_prewarming       0        0         0
zenaffBuRLS0QL7_4RsXnw security-crypto                             0        0        50
zenaffBuRLS0QL7_4RsXnw security-token-key                          0        0         0
zenaffBuRLS0QL7_4RsXnw snapshot                                    0        0         0
zenaffBuRLS0QL7_4RsXnw snapshot_meta                               0        0         0
zenaffBuRLS0QL7_4RsXnw system_critical_read                        0        0        82
zenaffBuRLS0QL7_4RsXnw system_critical_write                       0        0         0
zenaffBuRLS0QL7_4RsXnw system_read                                 0        0   2381915
zenaffBuRLS0QL7_4RsXnw system_write                                0        0   1669425
zenaffBuRLS0QL7_4RsXnw vector_tile_generation                      0        0         0
zenaffBuRLS0QL7_4RsXnw warmer                                      0        0  10483218
zenaffBuRLS0QL7_4RsXnw watcher                                     0        0         0
zenaffBuRLS0QL7_4RsXnw write                                       0        0   1267415

What is the size of the index?

What is the specification of the nodes with respect to RAM, heap size and CPU?

Which version of Elasticsearch are you using?

Hi @Christian_Dahlqvist

What is the size of the index?

index_with_data                                2 r STARTED 10590253   9.5gb 10.124.226.245 elk_es_data-1
index_with_data                                2 p STARTED 10590253   9.4gb 10.124.227.207 elk_es_data-2
index_with_data                                1 p STARTED 10470947   9.6gb 10.124.229.38  elk_es_data-0
index_with_data                                1 r STARTED 10470947   9.5gb 10.124.227.207 elk_es_data-2
index_with_data                                0 p STARTED 10483501   9.5gb 10.124.226.245 elk_es_data-1
index_with_data                                0 r STARTED 10483501   9.8gb 10.124.229.38  elk_es_data-0

What is the specification of the nodes with respect to RAM, heap size and CPU?

spec for data node

  Limits:                                                                                                                                                                                                                                                        │
│       cpu:     100m                                                                                                                                                                                                                                                

  Limits:                                                                                                                                                                                                                                                        │
│       memory:  42Gi                                                                                                                                                                                                                                                

so heap has 21 Gi

spec for master nodes

  Limits:                                                                                                                                                                                                                                                        │
│       cpu:     100m                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            │
│       memory:  4Gi

we're working on ES 7.17.8



I saw that you found my comments on a very similar thread. Let me provide some further details based on the information you provided.

Elasticsearch will use the heap assigned, but having more heap assigned than necessary will not necessarily improve performance. In addition to the heap Elasticsearch also stores some data off-heap. The rest of the memory is used by the operating system page cache to cache frequently accessed files and as I pointed out in the linked thread it is important to make sure your full data set fits in the page cache for high query concurrency use cases in order to avoid disk I/O to the greatest extent possible. The JVM heap graphs indicate that you can reduce the heap size, so that is what I would do. Try lowering it from 21GB to e.g. 14GB and see if that makes any difference. Also make sure that Elasticsearch has access to all the memory it has configured. You do not want to overprovision and risk having parts of the memory swapped out to disk.

The minimum query latency you can achieve will depend on the shard size, the data and the queries run. If possible I would recommend trying to reduce the primary shard count to 1 so a single shard can serve a query all by itself.

You seem to have a very low level of CPU configured. If you are able to support 400 concurrent queries with this your queries must be very simple. Elasticsearch sizes threadpools based on how many CPU cores are available so I would recommend increasing this and allocate a full 8 CPU cores to each data node. I would bump the CPU allocation of the master nodes as well. You do not want them to be starved of CPU when they actually need it. As with memory, make sure Elasticsearch has access to all the CPU resources that are configured and do not overprovision.

@Christian_Dahlqvist Thanks for replay

I was misled with these limits.
Each data node is allocated a full 8 cores of which, in practice, 2 cores are taken to support the OS.
During 500rps requests, as you can see, these resources are saturated almost evenly. But Master nodes consume very little CPU. So at least you suggest making 1 primary shard with 2 replicas on 3 nodes?





Master nodes are not involved in request processing so queries should be directed directly at data nodes. If having a single primary shard for the index still allows you to meet your latency requirements, I would recommend changing to that as it will allow each query to be served by a single shard, which reduces the number of threads involved in serving each query.

You may also want to try sending your queries with a _local preference.

It is also worth looking into how you are benchmarking the cluster. Are you using realistic queries with a realistic distribution? Are you using persistent connections to Elasticsearch? How many concurrent connections/threads are you using?

This cluster is testing through gatling test scenario. We have dedicated query to this database, so we 're trying to reach high limit but 400 is not the optimise results

node_name                     name                                   active queue rejected
elasticsearch-es-eck-data-0   analyze                                     0     0        0
elasticsearch-es-eck-data-0   auto_complete                               0     0        0
elasticsearch-es-eck-data-0   ccr                                         0     0        0
elasticsearch-es-eck-data-0   fetch_shard_started                         0     0        0
elasticsearch-es-eck-data-0   fetch_shard_store                           0     0        0
elasticsearch-es-eck-data-0   flush                                       0     0        0
elasticsearch-es-eck-data-0   force_merge                                 0     0        0
elasticsearch-es-eck-data-0   generic                                     0     0        0
elasticsearch-es-eck-data-0   get                                         0     0        0
elasticsearch-es-eck-data-0   listener                                    0     0        0
elasticsearch-es-eck-data-0   management                                  1     0        0
elasticsearch-es-eck-data-0   ml_datafeed                                 0     0        0
elasticsearch-es-eck-data-0   ml_job_comms                                0     0        0
elasticsearch-es-eck-data-0   ml_utility                                  0     0        0
elasticsearch-es-eck-data-0   refresh                                     0     0        0
elasticsearch-es-eck-data-0   rollup_indexing                             0     0        0
elasticsearch-es-eck-data-0   search                                     13   139   285619


      ],
      "attributes" : {
        "k8s_node_name" : "aks-nvme-16105314-vmss000007",
        "xpack.installed" : "true",
        "transform.node" : "false"
      },
      "http" : {
        "current_open" : 15,
        "total_opened" : 678801,
        "clients" : [

I have not used Gatling. Given the total number of connections reported opened it does not look like it is using persistent connections, which makes a big difference. If you would be using any of the official language clients to interact with Elasticsearch they would use persistent connections, so your benchmark may not be representative.

If you can not configure Gatling to use persistent connections I would recommend having a look at Rally, a tool specifically designed for benchmarking Elasticsearch.

but we need to consider the test environment very similar for the client mode (from the user perspective)
so that's why we're using such tool with configure some of search request

and we have very similar output as You have in Rally

Screenshot 2023-05-10 at 10.28.26

I do not understand. Which client library will you use to communicate with Elasticsearch once it is in production? Would this connect to Elasticsearch the same way Gatling does?

I am not arguing about output or results but rather the details of how the benchmark is performed and whether this is realistic or not.

gatling test use http to connect directly to elasticsearch data service

Yes it is as realistic as possible that is why it was used here. On the other hand, we notice as if there was some kind of service blockade from 500 rps upwards

All official Elasticsearch clients use connection pools with long-running persistent connections as establishing new connections add a significant amount of overhead, especially when TLS is used. Gatling uses HTTP, but if it creates a new connection per request it will add more load to Elasticsearch that the standard clients would, and if this is the case the benchmark may not be realistic.

Another interesting test to run is to spin up two gatling nodes and run these in parallel against Elasticsearch to ensure Elasticsearch and not Gatling is the bottleneck.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.