Getting less search speed

We have a cluster consisting of 3 masters (4 core, 16 GB RAM each), 3 hot(8 core, 32 GB RAM, 300 GB SSD each), and 3 warm nodes(8 core, 32GB RAM, 1.5TB HDD each).

We have one index for each month of year following the naming convention of voucher_YYYY_MMM(eg voucher_2021_JAN). and all these indexes have an alias voucher which acts as a read alias and our search query is directed towards this read alias.

Our index resides on the hot nodes for 32 days, and that is the period it will receive 99% of writes. Our estimate data is approximately 480 million docs in this index, it has 1 replica and 16 shards( we have taken 16 shards because eventually, our data will grow, right now we are thinking of shrinking down to 8 shards each shard with 30 GB of data, as per our mapping 2 million docs are taking 1GB of space).

After 32 days index will move to the warm nodes, currently, we have 450 million in our hot index and 1.8 billion documents collectively in our warm indexes. The total comes up to 2.25 billion docs.

Our doc contains customer id and some fields on which we are applying filters, they all are mapped as keyword types, we are using custom routing on customer id for improving our search speed.

our typical query looks like

GET voucher/_search?routing=1000636779&search_type=query_then_fetch
  "from": 0,
  "size": 20,
  "query": {
    "constant_score": {
      "filter": {
        "bool": {
          "filter": [
              "term": {
                "uId": {
                  "value": "1000636779",
                  "boost": 1
              "terms": {
                "isGift": [
  "version": true,
  "sort": [
      "cdInf.crtdAt": {
        "order": "desc"

We are using a constant score query because we don't want to score our documents and want to increase search speed.

We have 13 search threads on each of our hot and warm nodes and we are sending requests to our master node for indexing and searching.

we are sending 100 search requests per second and getting an average search response time of about 3.5 seconds, where max time is going up to 9 seconds.

I am not understanding what are we missing, why is our search performance so poor.

You can try profiling your queries and see where time is spent. See Profile queries and aggregations | Kibana Guide [8.1] | Elastic

Have you tried getting rid of the constant score query and just use filters in the bool query as those do not compute a score either (specifying no query is an explicit match all query giving all documents a score of 1.0)?

In general without more details this is hard to help. How much of that data can be cached in the page cache/file system cache? Is single threaded performance also poor or is performance going bad under a certain load? Is the uId field an integer or a keyword field?

if you are using routing, then having smaller shards might be a good idea to reduce what needs to be searched, as you are always searching a single shard - but again that might be verified best with testing?

Another thing to take a look at is the hot threads output and see where time is spent. See Nodes hot threads API | Elasticsearch Guide [8.1] | Elastic

Hope this helps as a start to gather more information. Also the Elasticsearch and JVM versions might help as a start.

Thank you for your reply,

We have shrunk down our data to 1.1 billion, the below observations are based on that amout.

I have tried profiling, but it didn't help much. response time even for single-threaded requests varies from 200-400 millis, and I have tried with using different customer ids and there is almost equal distribution of documents between customer ids(we are trying with mock data before moving to prod.)

On a single request, the search timing lies between 200-1000 millis(that's in itself is quite a deviation as we almost have equally distributed data based on customer id), but the problems start on heavy loads.

I have also tried using filters with boolean queries, it didn't make much difference, I have sampled 8000 requests 4 times, and it's a draw. I have also tried using the must with the term in the bool query but that makes the query slower.

Caching will not be much of a help with our use case, there will only be a few instances in which there is a repeat request made with the same customer id in quick succession for caching to took over. Although query cache once populated is taking down the response time to 100 millis but we have a little use for that.

uId field is a keyword.

As for sharding, every index has 8 shards and we have 12 indexes which mean we have to search 12 shards for each request, as customer ids will be distributed in all of the indexes(that's why we are using read alias), although custom routing is helping a lot.

Elastic version 7.15.2 and JVM version is 17.

Also, I was trying to find more info about what's going on, and when I hit _cat/nodes API while there was a search load I have got the below response.

ip             heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name           56          99  17    0.73    0.79     1.48 his       -      voucherelasticsearch-gv-v1-247-161           57          62   6    0.32    0.13     0.09 im        -      voucherelasticsearch-gv-v1-245-134            32          69   0    0.00    0.00     0.00 -         -      voucherelasticsearch-gv-v1-245-52             66          99   3    0.83    1.04     2.05 iw        -      voucherelasticsearch-gv-v1-250-1            9          63   7    0.05    0.01     0.00 im        *      voucherelasticsearch-gv-v1-251-138           26          99   4    0.51    0.43     1.38 his       -      voucherelasticsearch-gv-v1-249-165            62          99   4    0.96    1.42     4.81 iw        -      voucherelasticsearch-gv-v1-249-68            11          69   0    0.00    0.00     0.00 -         -      voucherelasticsearch-gv-v1-250-88           62          99  15    1.05    0.95     1.65 his       -      voucherelasticsearch-gv-v1-248-141           20          99  10    0.85    1.06     2.73 iw        -      voucherelasticsearch-gv-v1-245-156           40          62   5    0.32    0.11     0.04 im        -      voucherelasticsearch-gv-v1-248-106

the ram percent is 99 for some nodes, I don't think that's a good thing

Also, I am trying to gather more information but everything seems chaotic, there is little to no reproducible metrics, I have checked CPU utilization while querying, search queues, but I haven't found more than 10 requests in the search queues, and that too just for a fraction of second.

Also, we have index sorting in place, on the same field on which we are applying sort in our queries, Most of the fields are keywords or boolean in our index(two of them are text but we are not querying those right now).

You should not send data to the master nodes and the master nodes should probably ideally also not be ingest nodes. Instead send requests to your coordinating nodes or directly to the data nodes.

If you slowly increase query concurrency, how does the latency grow?

What time period are you targeting with the queries? Are the queries targeting both hot and warm nodes? If you only query a recent time period where all matching data can be found on the hot nodes, how does that affect query latency for different concurrency levels?

That is not a problem - it just means that the operating system page cache is in use.

I have not used this, but if I recall correctly it adds a lot of overhead when indexing data.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.