Elasticsearch timeouts/requests taking too long

Elastic.Clients.Elasticsearch version:
8.12.0
Elasticsearch version:
8.x.x
.NET runtime version:
8

Description of the problem including expected versus actual behavior:
When submitted to loading tests (let's say arround 30000-40000 per minute) elasticsearch starts originating timeouts or taking too long to reply (up to 80seconds easily with those values)

this is the query im doing:
{ "from": 0, "query": { "bool": { "must": [ { "nested": { "path": "productVariants.productVariantOfferPrices.offerPrice", "query": { "range": { "productVariants.productVariantOfferPrices.offerPrice.validFrom": { "lte": "2024-03-25T09:10:39" } } } } }, { "nested": { "path": "productVariants.productVariantOfferPrices.offerPrice", "query": { "range": { "productVariants.productVariantOfferPrices.offerPrice.validTo": { "gte": "2024-03-25T09:10:39" } } } } }, { "nested": { "path": "storeFront", "query": { "term": { "storeFront.searchUrls": { "case_insensitive": true, "value": "mytest" } } } } } ] } }, "size": 20, "sort": [ { "relevance": { "nested": {}, "order": "desc" } } ] }

I have arround 91045 documents (a lot of nested) and im paginating my results per 20. I can't get rid of the nested due to company reasons. There isnt that many information regarding how nested performance is penalized, here are a few questions to make me understand the situation better:

  • Combining two nested queries into one (like i could do), would provide better performance or elasticsearch is able to do that under the hood?
  • Besides using Filters to try to improve the performance (without removing the nested) is there anything else i could do?

(I had open a git issue Multiple timeouts/requests taking too long · Issue #107374 · elastic/elasticsearch · GitHub and was redirected here)

Any help is appreciated

What is the size and configuration of your cluster? What is the CPU allocation, memory and heap size of the nodes? What type of storage are you using?

How large are the index/indices you are querying using the sample search you provided? How many primary and replica shards? How many documents do the primary shards contain? What is the average size of the documents?

Are you indexing or updating at the same time you are querying? If so, at what rate?

The cluster is ran by a different team, but here are the configs i was able to pull out:

{
  "_shards" : {
    "total" : 2,
    "successful" : 2,
    "failed" : 0
  },
  "_all" : {
    "primaries" : {
      "docs" : {
        "count" : 90631,
        "deleted" : 2267
      },
      "shard_stats" : {
        "total_count" : 1
      },
      "store" : {
        "size_in_bytes" : 10714136,
        "total_data_set_size_in_bytes" : 10714136,
        "reserved_in_bytes" : 0
      },
      "indexing" : {
        "index_total" : 1438,
        "index_time_in_millis" : 1832,
        "index_current" : 0,
        "index_failed" : 0,
        "delete_total" : 22,
        "delete_time_in_millis" : 24,
        "delete_current" : 0,
        "noop_update_total" : 0,
        "is_throttled" : false,
        "throttle_time_in_millis" : 0,
        "write_load" : 5.735608652774992E-6
      },
      "get" : {
        "total" : 0,
        "time_in_millis" : 0,
        "exists_total" : 0,
        "exists_time_in_millis" : 0,
        "missing_total" : 0,
        "missing_time_in_millis" : 0,
        "current" : 0
      },
      "search" : {
        "open_contexts" : 0,
        "query_total" : 94,
        "query_time_in_millis" : 216,
        "query_current" : 0,
        "fetch_total" : 57,
        "fetch_time_in_millis" : 3594,
        "fetch_current" : 0,
        "scroll_total" : 0,
        "scroll_time_in_millis" : 0,
        "scroll_current" : 0,
        "suggest_total" : 0,
        "suggest_time_in_millis" : 0,
        "suggest_current" : 0
      },
      "merges" : {
        "current" : 0,
        "current_docs" : 0,
        "current_size_in_bytes" : 0,
        "total" : 4,
        "total_time_in_millis" : 828,
        "total_docs" : 100215,
        "total_size_in_bytes" : 13408159,
        "total_stopped_time_in_millis" : 0,
        "total_throttled_time_in_millis" : 0,
        "total_auto_throttle_in_bytes" : 20971520
      },
      "refresh" : {
        "total" : 457,
        "total_time_in_millis" : 3694,
        "external_total" : 253,
        "external_total_time_in_millis" : 4159,
        "listeners" : 0
      },
      "flush" : {
        "total" : 200,
        "periodic" : 200,
        "total_time_in_millis" : 10488
      },
      "warmer" : {
        "current" : 0,
        "total" : 252,
        "total_time_in_millis" : 72
      },
      "query_cache" : {
        "memory_size_in_bytes" : 172390,
        "total_count" : 823,
        "hit_count" : 116,
        "miss_count" : 707,
        "cache_size" : 22,
        "cache_count" : 30,
        "evictions" : 8
      },
      "fielddata" : {
        "memory_size_in_bytes" : 0,
        "evictions" : 6,
        "global_ordinals" : {
          "build_time_in_millis" : 32
        }
      },
      "completion" : {
        "size_in_bytes" : 0
      },
      "segments" : {
        "count" : 7,
        "memory_in_bytes" : 0,
        "terms_memory_in_bytes" : 0,
        "stored_fields_memory_in_bytes" : 0,
        "term_vectors_memory_in_bytes" : 0,
        "norms_memory_in_bytes" : 0,
        "points_memory_in_bytes" : 0,
        "doc_values_memory_in_bytes" : 0,
        "index_writer_memory_in_bytes" : 0,
        "version_map_memory_in_bytes" : 0,
        "fixed_bit_set_memory_in_bytes" : 95056,
        "max_unsafe_auto_id_timestamp" : -1,
        "file_sizes" : { }
      },
      "translog" : {
        "operations" : 0,
        "size_in_bytes" : 55,
        "uncommitted_operations" : 0,
        "uncommitted_size_in_bytes" : 55,
        "earliest_last_modified_age" : 681709
      },
      "request_cache" : {
        "memory_size_in_bytes" : 0,
        "evictions" : 0,
        "hit_count" : 3,
        "miss_count" : 20
      },
      "recovery" : {
        "current_as_source" : 0,
        "current_as_target" : 0,
        "throttle_time_in_millis" : 0
      },
      "bulk" : {
        "total_operations" : 161,
        "total_time_in_millis" : 2044,
        "total_size_in_bytes" : 24625858,
        "avg_time_in_millis" : 2,
        "avg_size_in_bytes" : 21527
      },
      "dense_vector" : {
        "value_count" : 0
      }
    },
    "total" : {
      "docs" : {
        "count" : 181262,
        "deleted" : 5390
      },
      "shard_stats" : {
        "total_count" : 2
      },
      "store" : {
        "size_in_bytes" : 21603531,
        "total_data_set_size_in_bytes" : 21603531,
        "reserved_in_bytes" : 0
      },
      "indexing" : {
        "index_total" : 2876,
        "index_time_in_millis" : 3408,
        "index_current" : 0,
        "index_failed" : 0,
        "delete_total" : 44,
        "delete_time_in_millis" : 42,
        "delete_current" : 0,
        "noop_update_total" : 0,
        "is_throttled" : false,
        "throttle_time_in_millis" : 0,
        "write_load" : 5.334470640124476E-6
      },
      "get" : {
        "total" : 0,
        "time_in_millis" : 0,
        "exists_total" : 0,
        "exists_time_in_millis" : 0,
        "missing_total" : 0,
        "missing_time_in_millis" : 0,
        "current" : 0
      },
      "search" : {
        "open_contexts" : 0,
        "query_total" : 216,
        "query_time_in_millis" : 608,
        "query_current" : 0,
        "fetch_total" : 85,
        "fetch_time_in_millis" : 3660,
        "fetch_current" : 0,
        "scroll_total" : 0,
        "scroll_time_in_millis" : 0,
        "scroll_current" : 0,
        "suggest_total" : 0,
        "suggest_time_in_millis" : 0,
        "suggest_current" : 0
      },
      "merges" : {
        "current" : 0,
        "current_docs" : 0,
        "current_size_in_bytes" : 0,
        "total" : 8,
        "total_time_in_millis" : 1080,
        "total_docs" : 113235,
        "total_size_in_bytes" : 17183476,
        "total_stopped_time_in_millis" : 0,
        "total_throttled_time_in_millis" : 0,
        "total_auto_throttle_in_bytes" : 41943040
      },
      "refresh" : {
        "total" : 909,
        "total_time_in_millis" : 8559,
        "external_total" : 505,
        "external_total_time_in_millis" : 10081,
        "listeners" : 0
      },
      "flush" : {
        "total" : 399,
        "periodic" : 399,
        "total_time_in_millis" : 21825
      },
      "warmer" : {
        "current" : 0,
        "total" : 503,
        "total_time_in_millis" : 188
      },
      "query_cache" : {
        "memory_size_in_bytes" : 262211,
        "total_count" : 1213,
        "hit_count" : 160,
        "miss_count" : 1053,
        "cache_size" : 36,
        "cache_count" : 44,
        "evictions" : 8
      },
      "fielddata" : {
        "memory_size_in_bytes" : 0,
        "evictions" : 12,
        "global_ordinals" : {
          "build_time_in_millis" : 49
        }
      },
      "completion" : {
        "size_in_bytes" : 0
      },
      "segments" : {
        "count" : 14,
        "memory_in_bytes" : 0,
        "terms_memory_in_bytes" : 0,
        "stored_fields_memory_in_bytes" : 0,
        "term_vectors_memory_in_bytes" : 0,
        "norms_memory_in_bytes" : 0,
        "points_memory_in_bytes" : 0,
        "doc_values_memory_in_bytes" : 0,
        "index_writer_memory_in_bytes" : 0,
        "version_map_memory_in_bytes" : 0,
        "fixed_bit_set_memory_in_bytes" : 190944,
        "max_unsafe_auto_id_timestamp" : -1,
        "file_sizes" : { }
      },
      "translog" : {
        "operations" : 0,
        "size_in_bytes" : 110,
        "uncommitted_operations" : 0,
        "uncommitted_size_in_bytes" : 110,
        "earliest_last_modified_age" : 681707
      },
      "request_cache" : {
        "memory_size_in_bytes" : 0,
        "evictions" : 0,
        "hit_count" : 3,
        "miss_count" : 29
      },
      "recovery" : {
        "current_as_source" : 0,
        "current_as_target" : 0,
        "throttle_time_in_millis" : 0
      },
      "bulk" : {
        "total_operations" : 322,
        "total_time_in_millis" : 3651,
        "total_size_in_bytes" : 49251716,
        "avg_time_in_millis" : 2,
        "avg_size_in_bytes" : 21527
      },
      "dense_vector" : {
        "value_count" : 0
      }
    }
  },
  "indices" : {
    "myindex_products" : {
      "uuid" : "UQ7QJ6WNQJG6XIYFx-h6OQ",
      "health" : "green",
      "status" : "open",
      "primaries" : {
        "docs" : {
          "count" : 90631,
          "deleted" : 2267
        },
        "shard_stats" : {
          "total_count" : 1
        },
        "store" : {
          "size_in_bytes" : 10714136,
          "total_data_set_size_in_bytes" : 10714136,
          "reserved_in_bytes" : 0
        },
        "indexing" : {
          "index_total" : 1438,
          "index_time_in_millis" : 1832,
          "index_current" : 0,
          "index_failed" : 0,
          "delete_total" : 22,
          "delete_time_in_millis" : 24,
          "delete_current" : 0,
          "noop_update_total" : 0,
          "is_throttled" : false,
          "throttle_time_in_millis" : 0,
          "write_load" : 5.735608652774992E-6
        },
        "get" : {
          "total" : 0,
          "time_in_millis" : 0,
          "exists_total" : 0,
          "exists_time_in_millis" : 0,
          "missing_total" : 0,
          "missing_time_in_millis" : 0,
          "current" : 0
        },
        "search" : {
          "open_contexts" : 0,
          "query_total" : 94,
          "query_time_in_millis" : 216,
          "query_current" : 0,
          "fetch_total" : 57,
          "fetch_time_in_millis" : 3594,
          "fetch_current" : 0,
          "scroll_total" : 0,
          "scroll_time_in_millis" : 0,
          "scroll_current" : 0,
          "suggest_total" : 0,
          "suggest_time_in_millis" : 0,
          "suggest_current" : 0
        },
        "merges" : {
          "current" : 0,
          "current_docs" : 0,
          "current_size_in_bytes" : 0,
          "total" : 4,
          "total_time_in_millis" : 828,
          "total_docs" : 100215,
          "total_size_in_bytes" : 13408159,
          "total_stopped_time_in_millis" : 0,
          "total_throttled_time_in_millis" : 0,
          "total_auto_throttle_in_bytes" : 20971520
        },
        "refresh" : {
          "total" : 457,
          "total_time_in_millis" : 3694,
          "external_total" : 253,
          "external_total_time_in_millis" : 4159,
          "listeners" : 0
        },
        "flush" : {
          "total" : 200,
          "periodic" : 200,
          "total_time_in_millis" : 10488
        },
        "warmer" : {
          "current" : 0,
          "total" : 252,
          "total_time_in_millis" : 72
        },
        "query_cache" : {
          "memory_size_in_bytes" : 172390,
          "total_count" : 823,
          "hit_count" : 116,
          "miss_count" : 707,
          "cache_size" : 22,
          "cache_count" : 30,
          "evictions" : 8
        },
        "fielddata" : {
          "memory_size_in_bytes" : 0,
          "evictions" : 6,
          "global_ordinals" : {
            "build_time_in_millis" : 32
          }
        },
        "completion" : {
          "size_in_bytes" : 0
        },
        "segments" : {
          "count" : 7,
          "memory_in_bytes" : 0,
          "terms_memory_in_bytes" : 0,
          "stored_fields_memory_in_bytes" : 0,
          "term_vectors_memory_in_bytes" : 0,
          "norms_memory_in_bytes" : 0,
          "points_memory_in_bytes" : 0,
          "doc_values_memory_in_bytes" : 0,
          "index_writer_memory_in_bytes" : 0,
          "version_map_memory_in_bytes" : 0,
          "fixed_bit_set_memory_in_bytes" : 95056,
          "max_unsafe_auto_id_timestamp" : -1,
          "file_sizes" : { }
        },
        "translog" : {
          "operations" : 0,
          "size_in_bytes" : 55,
          "uncommitted_operations" : 0,
          "uncommitted_size_in_bytes" : 55,
          "earliest_last_modified_age" : 681709
        },
        "request_cache" : {
          "memory_size_in_bytes" : 0,
          "evictions" : 0,
          "hit_count" : 3,
          "miss_count" : 20
        },
        "recovery" : {
          "current_as_source" : 0,
          "current_as_target" : 0,
          "throttle_time_in_millis" : 0
        },
        "bulk" : {
          "total_operations" : 161,
          "total_time_in_millis" : 2044,
          "total_size_in_bytes" : 24625858,
          "avg_time_in_millis" : 2,
          "avg_size_in_bytes" : 21527
        },
        "dense_vector" : {
          "value_count" : 0
        }
      },
      "total" : {
        "docs" : {
          "count" : 181262,
          "deleted" : 5390
        },
        "shard_stats" : {
          "total_count" : 2
        },
        "store" : {
          "size_in_bytes" : 21603531,
          "total_data_set_size_in_bytes" : 21603531,
          "reserved_in_bytes" : 0
        },
        "indexing" : {
          "index_total" : 2876,
          "index_time_in_millis" : 3408,
          "index_current" : 0,
          "index_failed" : 0,
          "delete_total" : 44,
          "delete_time_in_millis" : 42,
          "delete_current" : 0,
          "noop_update_total" : 0,
          "is_throttled" : false,
          "throttle_time_in_millis" : 0,
          "write_load" : 5.334470640124476E-6
        },
        "get" : {
          "total" : 0,
          "time_in_millis" : 0,
          "exists_total" : 0,
          "exists_time_in_millis" : 0,
          "missing_total" : 0,
          "missing_time_in_millis" : 0,
          "current" : 0
        },
        "search" : {
          "open_contexts" : 0,
          "query_total" : 216,
          "query_time_in_millis" : 608,
          "query_current" : 0,
          "fetch_total" : 85,
          "fetch_time_in_millis" : 3660,
          "fetch_current" : 0,
          "scroll_total" : 0,
          "scroll_time_in_millis" : 0,
          "scroll_current" : 0,
          "suggest_total" : 0,
          "suggest_time_in_millis" : 0,
          "suggest_current" : 0
        },
        "merges" : {
          "current" : 0,
          "current_docs" : 0,
          "current_size_in_bytes" : 0,
          "total" : 8,
          "total_time_in_millis" : 1080,
          "total_docs" : 113235,
          "total_size_in_bytes" : 17183476,
          "total_stopped_time_in_millis" : 0,
          "total_throttled_time_in_millis" : 0,
          "total_auto_throttle_in_bytes" : 41943040
        },
        "refresh" : {
          "total" : 909,
          "total_time_in_millis" : 8559,
          "external_total" : 505,
          "external_total_time_in_millis" : 10081,
          "listeners" : 0
        },
        "flush" : {
          "total" : 399,
          "periodic" : 399,
          "total_time_in_millis" : 21825
        },
        "warmer" : {
          "current" : 0,
          "total" : 503,
          "total_time_in_millis" : 188
        },
        "query_cache" : {
          "memory_size_in_bytes" : 262211,
          "total_count" : 1213,
          "hit_count" : 160,
          "miss_count" : 1053,
          "cache_size" : 36,
          "cache_count" : 44,
          "evictions" : 8
        },
        "fielddata" : {
          "memory_size_in_bytes" : 0,
          "evictions" : 12,
          "global_ordinals" : {
            "build_time_in_millis" : 49
          }
        },
        "completion" : {
          "size_in_bytes" : 0
        },
        "segments" : {
          "count" : 14,
          "memory_in_bytes" : 0,
          "terms_memory_in_bytes" : 0,
          "stored_fields_memory_in_bytes" : 0,
          "term_vectors_memory_in_bytes" : 0,
          "norms_memory_in_bytes" : 0,
          "points_memory_in_bytes" : 0,
          "doc_values_memory_in_bytes" : 0,
          "index_writer_memory_in_bytes" : 0,
          "version_map_memory_in_bytes" : 0,
          "fixed_bit_set_memory_in_bytes" : 190944,
          "max_unsafe_auto_id_timestamp" : -1,
          "file_sizes" : { }
        },
        "translog" : {
          "operations" : 0,
          "size_in_bytes" : 110,
          "uncommitted_operations" : 0,
          "uncommitted_size_in_bytes" : 110,
          "earliest_last_modified_age" : 681707
        },
        "request_cache" : {
          "memory_size_in_bytes" : 0,
          "evictions" : 0,
          "hit_count" : 3,
          "miss_count" : 29
        },
        "recovery" : {
          "current_as_source" : 0,
          "current_as_target" : 0,
          "throttle_time_in_millis" : 0
        },
        "bulk" : {
          "total_operations" : 322,
          "total_time_in_millis" : 3651,
          "total_size_in_bytes" : 49251716,
          "avg_time_in_millis" : 2,
          "avg_size_in_bytes" : 21527
        },
        "dense_vector" : {
          "value_count" : 0
        }
      }
    }
  }
}
 

The storage should be longhorn.
Regarding your question about indexing or updating at the same time as querying, we have a different service that takes care of updating the documents with bulk operations. That service updates the documents whenever there are changes in the BO, usually when we perform load tests, there are no changes at all, so there shouldnt be any updates at the same time as we are querying.

Could you please provide some insight to my bottom two questions?

  • Combining two nested queries into one (like i could do), would provide better performance or elasticsearch is able to do that under the hood?
  • Besides using Filters to try to improve the performance (without removing the nested) is there anything else i could do?

Can you please provide the full output of the cluster stats API?

Your index look quite small, so should be completely cached in the operating system page cache, which is good news.

Do you have monitoring installed? If so, what does CPU usage look like at the load level you described? Do you see any evidence of long or frequent GC in the Elasticsearch logs?

That I will need to leave to someone who knows the internals around this better.

Here:

{
  "_nodes": {
    "total": 3,
    "successful": 3,
    "failed": 0
  },
  "cluster_name": "elastic-ddg",
  "cluster_uuid": "qKhraERXTRCEq4wXjJU0aA",
  "timestamp": 1712918592575,
  "status": "green",
  "indices": {
    "count": 34,
    "shards": {
      "total": 69,
      "primaries": 34,
      "replication": 1.0294117647058822,
      "index": {
        "shards": {
          "min": 2,
          "max": 3,
          "avg": 2.0294117647058822
        },
        "primaries": {
          "min": 1,
          "max": 1,
          "avg": 1
        },
        "replication": {
          "min": 1,
          "max": 2,
          "avg": 1.0294117647058822
        }
      }
    },
    "docs": {
      "count": 182240,
      "deleted": 2847
    },
    "store": {
      "size_in_bytes": 49857308,
      "total_data_set_size_in_bytes": 49857308,
      "reserved_in_bytes": 0
    },
    "fielddata": {
      "memory_size_in_bytes": 0,
      "evictions": 175,
      "global_ordinals": {
        "build_time_in_millis": 854
      }
    },
    "query_cache": {
      "memory_size_in_bytes": 17898542,
      "total_count": 17052970,
      "hit_count": 3549152,
      "miss_count": 13503818,
      "cache_size": 2228,
      "cache_count": 18559,
      "evictions": 16331
    },
    "completion": {
      "size_in_bytes": 0
    },
    "segments": {
      "count": 116,
      "memory_in_bytes": 0,
      "terms_memory_in_bytes": 0,
      "stored_fields_memory_in_bytes": 0,
      "term_vectors_memory_in_bytes": 0,
      "norms_memory_in_bytes": 0,
      "points_memory_in_bytes": 0,
      "doc_values_memory_in_bytes": 0,
      "index_writer_memory_in_bytes": 0,
      "version_map_memory_in_bytes": 0,
      "fixed_bit_set_memory_in_bytes": 376512,
      "max_unsafe_auto_id_timestamp": 1711358322868,
      "file_sizes": {}
    },
    "mappings": {
      "total_field_count": 8856,
      "total_deduplicated_field_count": 6652,
      "total_deduplicated_mapping_size_in_bytes": 47159,
      "field_types": [
        {
          "name": "alias",
          "count": 76,
          "index_count": 1,
          "script_count": 0
        },
        {
          "name": "binary",
          "count": 1,
          "index_count": 1,
          "script_count": 0
        },
        {
          "name": "boolean",
          "count": 173,
          "index_count": 18,
          "script_count": 0
        },
        {
          "name": "byte",
          "count": 3,
          "index_count": 3,
          "script_count": 0
        },
        {
          "name": "constant_keyword",
          "count": 3,
          "index_count": 1,
          "script_count": 0
        },
        {
          "name": "date",
          "count": 379,
          "index_count": 19,
          "script_count": 0
        },
        {
          "name": "date_range",
          "count": 9,
          "index_count": 9,
          "script_count": 0
        },
        {
          "name": "double",
          "count": 32,
          "index_count": 6,
          "script_count": 0
        },
        {
          "name": "flattened",
          "count": 53,
          "index_count": 9,
          "script_count": 0
        },
        {
          "name": "float",
          "count": 64,
          "index_count": 9,
          "script_count": 0
        },
        {
          "name": "geo_point",
          "count": 32,
          "index_count": 4,
          "script_count": 0
        },
        {
          "name": "integer",
          "count": 60,
          "index_count": 4,
          "script_count": 0
        },
        {
          "name": "ip",
          "count": 69,
          "index_count": 5,
          "script_count": 0
        },
        {
          "name": "keyword",
          "count": 5170,
          "index_count": 21,
          "script_count": 0
        },
        {
          "name": "long",
          "count": 697,
          "index_count": 17,
          "script_count": 0
        },
        {
          "name": "match_only_text",
          "count": 360,
          "index_count": 4,
          "script_count": 0
        },
        {
          "name": "nested",
          "count": 94,
          "index_count": 8,
          "script_count": 0
        },
        {
          "name": "object",
          "count": 1427,
          "index_count": 19,
          "script_count": 0
        },
        {
          "name": "rank_features",
          "count": 1,
          "index_count": 1,
          "script_count": 0
        },
        {
          "name": "scaled_float",
          "count": 30,
          "index_count": 7,
          "script_count": 0
        },
        {
          "name": "text",
          "count": 24,
          "index_count": 9,
          "script_count": 0
        },
        {
          "name": "version",
          "count": 11,
          "index_count": 11,
          "script_count": 0
        },
        {
          "name": "wildcard",
          "count": 88,
          "index_count": 4,
          "script_count": 0
        }
      ],
      "runtime_field_types": []
    },
    "analysis": {
      "char_filter_types": [],
      "tokenizer_types": [],
      "filter_types": [],
      "analyzer_types": [],
      "built_in_char_filters": [],
      "built_in_tokenizers": [],
      "built_in_filters": [],
      "built_in_analyzers": [],
      "synonyms": {}
    },
    "versions": [
      {
        "version": "8500010",
        "index_count": 34,
        "primary_shard_count": 34,
        "total_primary_bytes": 24833063
      }
    ],
    "search": {
      "total": 3314801,
      "queries": {
        "bool": 3312594,
        "prefix": 15,
        "match": 4491,
        "range": 3010484,
        "nested": 2529675,
        "wildcard": 16,
        "match_phrase": 52,
        "terms": 216137,
        "match_phrase_prefix": 18,
        "match_all": 83,
        "exists": 272682,
        "term": 3097742,
        "query_string": 1,
        "simple_query_string": 42806
      },
      "rescorers": {},
      "sections": {
        "highlight": 11,
        "stored_fields": 19,
        "runtime_mappings": 258927,
        "query": 3312807,
        "script_fields": 19,
        "terminate_after": 91,
        "pit": 757,
        "_source": 4714,
        "fields": 1880,
        "collapse": 65,
        "aggs": 820268
      }
    },
    "dense_vector": {
      "value_count": 0
    }
  },
  "nodes": {
    "count": {
      "total": 3,
      "coordinating_only": 0,
      "data": 3,
      "data_cold": 0,
      "data_content": 0,
      "data_frozen": 0,
      "data_hot": 0,
      "data_warm": 0,
      "index": 0,
      "ingest": 0,
      "master": 3,
      "ml": 0,
      "remote_cluster_client": 0,
      "search": 0,
      "transform": 0,
      "voting_only": 0
    },
    "versions": [
      "8.12.1"
    ],
    "os": {
      "available_processors": 3,
      "allocated_processors": 3,
      "names": [
        {
          "name": "Linux",
          "count": 3
        }
      ],
      "pretty_names": [
        {
          "pretty_name": "Ubuntu 20.04.6 LTS",
          "count": 3
        }
      ],
      "architectures": [
        {
          "arch": "amd64",
          "count": 3
        }
      ],
      "mem": {
        "total_in_bytes": 12884901888,
        "adjusted_total_in_bytes": 12884901888,
        "free_in_bytes": 5267587072,
        "used_in_bytes": 7617314816,
        "free_percent": 41,
        "used_percent": 59
      }
    },
    "process": {
      "cpu": {
        "percent": 5
      },
      "open_file_descriptors": {
        "min": 581,
        "max": 591,
        "avg": 587
      }
    },
    "jvm": {
      "max_uptime_in_millis": 1564891314,
      "versions": [
        {
          "version": "21.0.2",
          "vm_name": "OpenJDK 64-Bit Server VM",
          "vm_version": "21.0.2+13-58",
          "vm_vendor": "Oracle Corporation",
          "bundled_jdk": true,
          "using_bundled_jdk": true,
          "count": 3
        }
      ],
      "mem": {
        "heap_used_in_bytes": 2331555920,
        "heap_max_in_bytes": 6442450944
      },
      "threads": 161
    },
    "fs": {
      "total_in_bytes": 31392067584,
      "free_in_bytes": 31335301120,
      "available_in_bytes": 31284969472
    },
    "plugins": [],
    "network_types": {
      "transport_types": {
        "security4": 3
      },
      "http_types": {
        "security4": 3
      }
    },
    "discovery_types": {
      "multi-node": 3
    },
    "packaging_types": [
      {
        "flavor": "default",
        "type": "docker",
        "count": 3
      }
    ],
    "ingest": {
      "number_of_pipelines": 16,
      "processor_stats": {
        "attachment": {
          "count": 0,
          "failed": 0,
          "current": 0,
          "time_in_millis": 0
        },
        "date_index_name": {
          "count": 0,
          "failed": 0,
          "current": 0,
          "time_in_millis": 0
        },
        "dot_expander": {
          "count": 0,
          "failed": 0,
          "current": 0,
          "time_in_millis": 0
        },
        "foreach": {
          "count": 0,
          "failed": 0,
          "current": 0,
          "time_in_millis": 0
        },
        "geoip": {
          "count": 0,
          "failed": 0,
          "current": 0,
          "time_in_millis": 0
        },
        "gsub": {
          "count": 0,
          "failed": 0,
          "current": 0,
          "time_in_millis": 0
        },
        "inference": {
          "count": 0,
          "failed": 0,
          "current": 0,
          "time_in_millis": 0
        },
        "json": {
          "count": 0,
          "failed": 0,
          "current": 0,
          "time_in_millis": 0
        },
        "pipeline": {
          "count": 0,
          "failed": 0,
          "current": 0,
          "time_in_millis": 0
        },
        "remove": {
          "count": 0,
          "failed": 0,
          "current": 0,
          "time_in_millis": 0
        },
        "rename": {
          "count": 0,
          "failed": 0,
          "current": 0,
          "time_in_millis": 0
        },
        "script": {
          "count": 0,
          "failed": 0,
          "current": 0,
          "time_in_millis": 0
        },
        "set": {
          "count": 0,
          "failed": 0,
          "current": 0,
          "time_in_millis": 0
        },
        "set_security_user": {
          "count": 0,
          "failed": 0,
          "current": 0,
          "time_in_millis": 0
        },
        "split": {
          "count": 0,
          "failed": 0,
          "current": 0,
          "time_in_millis": 0
        },
        "trim": {
          "count": 0,
          "failed": 0,
          "current": 0,
          "time_in_millis": 0
        },
        "uri_parts": {
          "count": 0,
          "failed": 0,
          "current": 0,
          "time_in_millis": 0
        },
        "user_agent": {
          "count": 0,
          "failed": 0,
          "current": 0,
          "time_in_millis": 0
        }
      }
    },
    "indexing_pressure": {
      "memory": {
        "current": {
          "combined_coordinating_and_primary_in_bytes": 0,
          "coordinating_in_bytes": 0,
          "primary_in_bytes": 0,
          "replica_in_bytes": 0,
          "all_in_bytes": 0
        },
        "total": {
          "combined_coordinating_and_primary_in_bytes": 0,
          "coordinating_in_bytes": 0,
          "primary_in_bytes": 0,
          "replica_in_bytes": 0,
          "all_in_bytes": 0,
          "coordinating_rejections": 0,
          "primary_rejections": 0,
          "replica_rejections": 0
        },
        "limit_in_bytes": 0
      }
    }
  },
  "snapshots": {
    "current_counts": {
      "snapshots": 0,
      "shard_snapshots": 0,
      "snapshot_deletions": 0,
      "concurrent_operations": 0,
      "cleanups": 0
    },
    "repositories": {}
  }
}

We have grafana, i notice an increase in CPU usage on my elasticsearch pods but apart from that, nothing, memory always very stable and even the CPU increase is not that big (and we have a lot of resources allocated)

What is the CPU allocation for these pods?

The CPU load was for arround ~60000mil requests in 1minute

Even when it's using cache:

   "myindex_products": {
      "uuid": "xFHMqEHuSs-a4Nby_N7P2g",
      "health": "green",
      "status": "open",
      "primaries": {
        "request_cache": {
          "memory_size": "20.4mb",
          "memory_size_in_bytes": 21474816,
          "evictions": 361103,
          "hit_count": 44248,
          "miss_count": 403595
        }
      },
      "total": {
        "request_cache": {
          "memory_size": "40.9mb",
          "memory_size_in_bytes": 42943712,
          "evictions": 674745,
          "hit_count": 82324,
          "miss_count": 752064
        }
      }
    }

Some requests are taking 40seconds to complete (im using .net 8 client as i mentioned earlier) when submited to a big load. I saw the hit_count increasing everytime, while miss_count stayed with the same value, yet, there is still degradation in my elasticsearch requests :confused:

You did not answer my question.

The graph seems to indicate that it is initially set to 1 CPU core but may increase to 2, but I am not sure if I am reading that correctly. If that is correct it sounds like a very low level. I would recommend increasing it and see what impact that has.

It looks like you are querying a single index with a single primary shard and one replica, is that correct. If so I would recommend increasing the number of replicas to 2 so all nodes hold a copy of the shard.

Once that is donw I would recommend slowly stepping up the number of concurrent queries over a period of time and monitor how CPU usage and latency vary with increased concurrency. Run queries at a certain level for a couple of minues and then increase it. Run queries at that level of concurrency for a few minutes and then increase it again. Repeat until your queries are no longer meeting the latency you require.

CPU allocation is up to 2 cores. I can increase the resources and test it all over again, but i'm not very confident that will help. The pods seem to be very stable resource wise, and having timeouts in my elasticsearch requests when i'm seeing hits in the cache seems quite odd to me

Well, can someone point me in the right direction regarding my questions please? So i have a better understanding on how things work in the background and can make better decisions going foward:

  • Combining two nested queries into one (like i could do), would provide better performance or elasticsearch is able to do that under the hood?
  • Besides using Filters to try to improve the performance (without removing the nested) is there anything else i could do?
  • In the example i provided, is it normal still having timeouts (100seconds) or requests taking very long (on load) even though im seeing cache_hits on my index? Is it actually using the cache? if yes, how does it work exactly because it seems odd to me

And thank you Christian for your input and insights, i will try to play arround the resources once i'm able to have more answers