Scaling of cardinality sub-aggregation

Hello there,

as continuation of this discussion thread, what other options are there in Elasticsearch to scale aggregation?

I have experimented with:

  • adding client nodes (didn't help for big result-set)
  • playing with "precision_threshold" in inner aggregation
  • adding more shards and replicas (50shards to 100 shards, 0-2 replicas)
  • increasing refresh interval (10s to 30s)
  • changing some merge/translog index settings
  • no observations of heap limit hit or any other resource lack (~700GB data, 3 client nodes, 17 data nodes)

but no significant success with regards to performance (my goal is to go under 1sec).

I'm already testing effect of "eager_global_ordinals" mapping for all outer aggregations.

Thank you for any advice, direction
Dominik

What version of Elasticsearch are you using?

Also, could you provide the mappings & queries you're using, as well as some example data, and example profiling outputs?

1 Like

Hi @BenB196,

firstly - thanks for reaching out.

We are using ES in version 7.17.6, cluster has 23 nodes (17 data nodes: 1TB storage, 64GB RAM, 3 client and 3 master nodes).

Reproducing full scenario might be very complex (using query template with facets, highlighting, mapping contains 100+ fields).

So I have isolated only required 27 fields used in 14 aggregations.

Mapping
{
	"settings": {
		"number_of_shards": 1,
		"number_of_replicas": 1,
		"refresh_interval": "10s",
		"default_pipeline": "treaty_ingestion_pipeline"
	},
	"mappings": {
		"dynamic_date_formats": [
			"yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||ordinal_date_time_no_millis||date_optional_time"
		],
		"date_detection": true,
		"numeric_detection": false,
		"properties": {
			"content": {
				"properties": {
					"DOC_DOC_NM": {
						"type": "text",
						"fields": {
							"keyword": {
								"type": "keyword",
								"null_value": "NULL_VALUE",
								"ignore_above": 2000
							}
						}
					},
					"DOC_DEAL_NO": {
						"type": "keyword",
						"eager_global_ordinals": true,
						"null_value": "NULL_VALUE",
						"ignore_above": 2000,
						"fields": {
							"keyword": {
								"type": "keyword",
								"eager_global_ordinals": true,
								"null_value": "NULL_VALUE",
								"ignore_above": 2000
							},
							"hash": {
								"type": "murmur3"
							}
						}
					},
					"ACCP_TYP_OF_AGRMNT_DESC_LV2": {
						"type": "text",
						"fields": {
							"keyword": {
								"type": "keyword",
								"eager_global_ordinals": true,
								"ignore_above": 2000
							}
						}
					},
					"ACCP_TYP_OF_AGRMNT_DESC_LV3": {
						"type": "text",
						"fields": {
							"keyword": {
								"type": "keyword",
								"eager_global_ordinals": true,
								"ignore_above": 2000
							}
						}
					},
					"ACCP_MAIN_LOB_ID_LST": {
						"type": "text",
						"fields": {
							"keyword": {
								"type": "keyword",
								"eager_global_ordinals": true,
								"null_value": "NULL_VALUE",
								"ignore_above": 2000
							}
						}
					},
					"ACCP_ACCP_PRFT_CNTR_DESC": {
						"type": "text",
						"fields": {
							"keyword": {
								"type": "keyword",
      					"eager_global_ordinals": true,
								"null_value": "NULL_VALUE",
								"ignore_above": 2000
							}
						}
					},
					"ACCP_ACCP_PRFT_CNTR_DESC_LV1": {
						"type": "keyword",
						"eager_global_ordinals": true
					},
					"ACCP_ACCP_PRFT_CNTR_DESC_LV2": {
						"type": "keyword",
						"eager_global_ordinals": true,
						"fields": {
							"keyword": {
								"type": "keyword",
								"eager_global_ordinals": true,
								"null_value": "NULL_VALUE",
								"ignore_above": 256
							}
						}
					},
					"ACCP_ACCP_PRFT_CNTR_DESC_LV3": {
						"type": "keyword",
						"eager_global_ordinals": true
					},
					"ACCP_ACCP_PRFT_CNTR_DESC_LV4": {
						"type": "keyword",
						"eager_global_ordinals": true
					},
					"ACCP_ACCP_PRFT_CNTR_DESC_LV5": {
						"type": "keyword",
						"eager_global_ordinals": true
					},
					"ACCP_UWRT_YR": {
						"type": "text",
						"fields": {
							"keyword": {
								"type": "keyword",
								"eager_global_ordinals": true,
								"null_value": "NULL_VALUE",
								"ignore_above": 2000
							}
						}
					},
					"ACCP_UWRT_OBJ_STAT_DESC": {
						"type": "text",
						"fields": {
							"keyword": {
								"type": "keyword",
								"eager_global_ordinals": true,
								"null_value": "NULL_VALUE",
								"ignore_above": 1020
							}
						}
					},
					"ACCP_TYP_OF_AGRMNT_DESC": {
						"type": "text",
						"fields": {
							"keyword": {
								"type": "keyword",
								"eager_global_ordinals": true,
								"null_value": "NULL_VALUE",
								"ignore_above": 1020
							}
						}
					},
					"ACCP_MAIN_BRKR_CTRY": {
						"type": "text",
						"fields": {
							"keyword": {
								"null_value": "NULL_VALUE",
								"type": "keyword",
								"eager_global_ordinals": true,
								"ignore_above": 1024
							}
						}
					},
					"ACCP_MAIN_BRKR_LGL_CCAT": {
						"type": "text",
						"fields": {
							"keyword": {
								"null_value": "NULL_VALUE",
								"eager_global_ordinals": true,
								"type": "keyword",
								"ignore_above": 256
							}
						}
					},
					"ACCP_MAIN_CARR_LGL_CCAT": {
						"type": "text",
						"fields": {
							"keyword": {
								"null_value": "NULL_VALUE",
								"eager_global_ordinals": true,
								"type": "keyword",
								"ignore_above": 256
							}
						}
					},
					"ACCP_MAIN_CLI_CCAT": {
						"type": "keyword",
						"null_value": "NULL_VALUE",
						"ignore_above": 2000,
						"fields": {
							"keyword": {
								"type": "keyword",
								"eager_global_ordinals": true,
								"null_value": "NULL_VALUE",
								"ignore_above": 2000
							}
						}
					},
					"ACCP_MAIN_CLI_CTRY": {
						"type": "text",
						"fields": {
							"keyword": {
								"type": "keyword",
								"eager_global_ordinals": true,
								"null_value": "NULL_VALUE",
								"ignore_above": 1020
							}
						}
					},
					"ACCP_BUS_PARTIC_TYP_DESC": {
						"type": "text",
						"fields": {
							"keyword": {
								"type": "keyword",
								"eager_global_ordinals": true,
								"null_value": "NULL_VALUE",
								"ignore_above": 1020
							}
						}
					},
					"ACCP_BUS_PARTIC_TYP_DESC_LV1": {
						"type": "keyword",
						"eager_global_ordinals": true
					},
					"ACCP_BUS_PARTIC_TYP_DESC_LV2": {
						"type": "keyword",
						"eager_global_ordinals": true
					},
					"ACCP_BUS_PARTIC_TYP_DESC_LV3": {
						"type": "keyword",
						"eager_global_ordinals": true
					},
					"ACCP_MAIN_CLI_GRP_LGL_NM": {
						"type": "text",
						"fields": {
							"keyword": {
								"type": "keyword",
								"eager_global_ordinals": true,
								"null_value": "NULL_VALUE",
								"ignore_above": 1020
							}
						}
					}
				}
			}
		}
	}
}

simple aggregations (0.3s)
GET treaty_v4/_search?human=true
{
  "profile": true, 
  "size": 0, 
	"aggs": {
		"partners": {
			"terms": {
				"field": "content.ACCP_MAIN_CLI_CCAT.keyword",
				"order": {
					"_key": "asc"
				},
				"size": 500
			}
		},
		"profitCenters": {
			"terms": {
				"field": "content.ACCP_ACCP_PRFT_CNTR_DESC_LV1",
				"size": 10
			},
			"aggs": {
				"profitCenters_lv2": {
					"terms": {
						"field": "content.ACCP_ACCP_PRFT_CNTR_DESC_LV2",
						"size": 20
					},
					"aggs": {
						"profitCenters_lv3": {
							"terms": {
								"field": "content.ACCP_ACCP_PRFT_CNTR_DESC_LV3",
								"size": 100
							},
							"aggs": {
								"profitCenters_lv4": {
									"terms": {
										"field": "content.ACCP_ACCP_PRFT_CNTR_DESC_LV4",
										"size": 100
									},
									"aggs": {
										"profitCenters_lv5": {
											"terms": {
												"field": "content.ACCP_ACCP_PRFT_CNTR_DESC_LV5",
												"size": 100
											}
										}
									}
								}
							}
						}
					}
				}
			}
		},
		"lobIdList": {
			"terms": {
				"field": "content.ACCP_MAIN_LOB_ID_LST.keyword",
				"order": {
					"_key": "asc"
				},
				"size": 500
			}
		},
		"clientCountry": {
			"terms": {
				"field": "content.ACCP_MAIN_CLI_CTRY.keyword",
				"order": {
					"_key": "asc"
				},
				"size": 1000
			}
		},
		"typeOfBusiness": {
			"terms": {
				"field": "content.ACCP_BUS_PARTIC_TYP_DESC_LV1",
				"size": 10
			},
			"aggs": {
				"typeOfBusiness_lv2": {
					"terms": {
						"field": "content.ACCP_BUS_PARTIC_TYP_DESC_LV2",
						"size": 20
					},
					"aggs": {
						"typeOfBusiness_lv3": {
							"terms": {
								"field": "content.ACCP_BUS_PARTIC_TYP_DESC_LV3",
								"size": 20
							}
						}
					}
				}
			}
		},
		"clientGroup": {
			"terms": {
				"field": "content.ACCP_MAIN_CLI_GRP_LGL_NM.keyword",
				"order": {
					"_key": "asc"
				},
				"size": 1000
			}
		},
		"reinsurer": {
			"terms": {
				"field": "content.ACCP_MAIN_CARR_LGL_CCAT.keyword",
				"order": {
					"_key": "asc"
				},
				"size": 250
			}
		},
		"broker": {
			"terms": {
				"field": "content.ACCP_MAIN_BRKR_LGL_CCAT.keyword",
				"order": {
					"_key": "asc"
				},
				"size": 1000
			}
		},
		"brokerCountry": {
			"terms": {
				"field": "content.ACCP_MAIN_BRKR_CTRY.keyword",
				"order": {
					"_key": "asc"
				},
				"size": 150
			}
		},
		"division": {
			"terms": {
				"field": "content.ACCP_ACCP_PRFT_CNTR_DESC_LV2",
				"order": {
					"_key": "asc"
				},
				"size": 50
			}
		},
		"uwrtYear": {
			"terms": {
				"field": "content.ACCP_UWRT_YR.keyword",
				"order": {
					"_key": "desc"
				},
				"size": 150
			}
		},
		"uwrtStatus": {
			"terms": {
				"field": "content.ACCP_UWRT_OBJ_STAT_DESC.keyword",
				"order": {
					"_key": "asc"
				},
				"size": 50
			}
		},
		"typeOfAgreement": {
			"terms": {
				"field": "content.ACCP_TYP_OF_AGRMNT_DESC_LV2.keyword",
				"order": {
					"_key": "asc"
				},
				"size": 10
			},
			"aggs": {
				"typeOfAgreement_lv3": {
					"terms": {
						"field": "content.ACCP_TYP_OF_AGRMNT_DESC_LV3.keyword",
						"size": 20
					}
				}
			}
		},
		"dealIds": {
			"terms": {
				"field": "content.DOC_DEAL_NO.keyword",
				"order": {
					"_key": "asc"
				},
				"size": 100
			}
		}
	}
}
sub-aggregations (2.4s)
GET index/_search?human=true
{
  "profile": true, 
  "size": 0, 
	"aggs": {
		"partners": {
			"terms": {
				"field": "content.ACCP_MAIN_CLI_CCAT.keyword",
				"order": {
					"_key": "asc"
				},
				"size": 500
			},
			"aggs": {
				"deals_count": {
					"cardinality": {
						"field": "content.DOC_DEAL_NO.hash",
						"precision_threshold": "3000"
					}
				}
			}
		},
		"profitCenters": {
			"terms": {
				"field": "content.ACCP_ACCP_PRFT_CNTR_DESC_LV1",
				"size": 10
			},
			"aggs": {
				"deals_count": {
					"cardinality": {
						"field": "content.DOC_DEAL_NO.hash",
						"precision_threshold": "3000"
					}
				},
				"profitCenters_lv2": {
					"terms": {
						"field": "content.ACCP_ACCP_PRFT_CNTR_DESC_LV2",
						"size": 20
					},
					"aggs": {
						"deals_count": {
							"cardinality": {
								"field": "content.DOC_DEAL_NO.hash",
								"precision_threshold": "3000"
							}
						},
						"profitCenters_lv3": {
							"terms": {
								"field": "content.ACCP_ACCP_PRFT_CNTR_DESC_LV3",
								"size": 100
							},
							"aggs": {
								"deals_count": {
									"cardinality": {
										"field": "content.DOC_DEAL_NO.hash",
										"precision_threshold": "3000"
									}
								},
								"profitCenters_lv4": {
									"terms": {
										"field": "content.ACCP_ACCP_PRFT_CNTR_DESC_LV4",
										"size": 100
									},
									"aggs": {
										"deals_count": {
											"cardinality": {
												"field": "content.DOC_DEAL_NO.hash",
												"precision_threshold": "3000"
											}
										},
										"profitCenters_lv5": {
											"terms": {
												"field": "content.ACCP_ACCP_PRFT_CNTR_DESC_LV5",
												"size": 100
											},
											"aggs": {
												"deals_count": {
													"cardinality": {
														"field": "content.DOC_DEAL_NO.hash",
														"precision_threshold": "3000"
													}
												}
											}
										}
									}
								}
							}
						}
					}
				}
			}
		},
		"lobIdList": {
			"terms": {
				"field": "content.ACCP_MAIN_LOB_ID_LST.keyword",
				"order": {
					"_key": "asc"
				},
				"size": 500
			},
			"aggs": {
				"deals_count": {
					"cardinality": {
						"field": "content.DOC_DEAL_NO.hash",
						"precision_threshold": "3000"
					}
				}
			}
		},
		"clientCountry": {
			"terms": {
				"field": "content.ACCP_MAIN_CLI_CTRY.keyword",
				"order": {
					"_key": "asc"
				},
				"size": 1000
			},
			"aggs": {
				"deals_count": {
					"cardinality": {
						"field": "content.DOC_DEAL_NO.hash",
						"precision_threshold": "3000"
					}
				}
			}
		},
		"typeOfBusiness": {
			"terms": {
				"field": "content.ACCP_BUS_PARTIC_TYP_DESC_LV1",
				"size": 10
			},
			"aggs": {
				"deals_count": {
					"cardinality": {
						"field": "content.DOC_DEAL_NO.hash",
						"precision_threshold": "3000"
					}
				},
				"typeOfBusiness_lv2": {
					"terms": {
						"field": "content.ACCP_BUS_PARTIC_TYP_DESC_LV2",
						"size": 20
					},
					"aggs": {
						"deals_count": {
							"cardinality": {
								"field": "content.DOC_DEAL_NO.hash",
								"precision_threshold": "3000"
							}
						},
						"typeOfBusiness_lv3": {
							"terms": {
								"field": "content.ACCP_BUS_PARTIC_TYP_DESC_LV3",
								"size": 20
							},
							"aggs": {
								"deals_count": {
									"cardinality": {
										"field": "content.DOC_DEAL_NO.hash",
										"precision_threshold": "3000"
									}
								}
							}
						}
					}
				}
			}
		},
		"contentType": {
			"terms": {
				"field": "content.DOC_CNTNT_CLSFCN.keyword",
				"order": {
					"_key": "asc"
				},
				"size": 20
			}
		},
		"clientGroup": {
			"terms": {
				"field": "content.ACCP_MAIN_CLI_GRP_LGL_NM.keyword",
				"order": {
					"_key": "asc"
				},
				"size": 1000
			},
			"aggs": {
				"deals_count": {
					"cardinality": {
						"field": "content.DOC_DEAL_NO.hash",
						"precision_threshold": "3000"
					}
				}
			}
		},
		"reinsurer": {
			"terms": {
				"field": "content.ACCP_MAIN_CARR_LGL_CCAT.keyword",
				"order": {
					"_key": "asc"
				},
				"size": 250
			},
			"aggs": {
				"deals_count": {
					"cardinality": {
						"field": "content.DOC_DEAL_NO.hash",
						"precision_threshold": "3000"
					}
				}
			}
		},
		"broker": {
			"terms": {
				"field": "content.ACCP_MAIN_BRKR_LGL_CCAT.keyword",
				"order": {
					"_key": "asc"
				},
				"size": 1000
			},
			"aggs": {
				"deals_count": {
					"cardinality": {
						"field": "content.DOC_DEAL_NO.hash",
						"precision_threshold": "3000"
					}
				}
			}
		},
		"brokerCountry": {
			"terms": {
				"field": "content.ACCP_MAIN_BRKR_CTRY.keyword",
				"order": {
					"_key": "asc"
				},
				"size": 150
			},
			"aggs": {
				"deals_count": {
					"cardinality": {
						"field": "content.DOC_DEAL_NO.hash",
						"precision_threshold": "3000"
					}
				}
			}
		},
		"activityGroup": {
			"terms": {
				"field": "content.DOC_ACTIV_GRP_TYP_LST.keyword",
				"order": {
					"_key": "asc"
				},
				"size": 50
			}
		},
		"division": {
			"terms": {
				"field": "content.ACCP_ACCP_PRFT_CNTR_DESC_LV2",
				"order": {
					"_key": "asc"
				},
				"size": 50
			},
			"aggs": {
				"deals_count": {
					"cardinality": {
						"field": "content.DOC_DEAL_NO.hash",
						"precision_threshold": "3000"
					}
				}
			}
		},
		"uwrtYear": {
			"terms": {
				"field": "content.ACCP_UWRT_YR.keyword",
				"order": {
					"_key": "desc"
				},
				"size": 150
			},
			"aggs": {
				"deals_count": {
					"cardinality": {
						"field": "content.DOC_DEAL_NO.hash",
						"precision_threshold": "3000"
					}
				}
			}
		},
		"uwrtStatus": {
			"terms": {
				"field": "content.ACCP_UWRT_OBJ_STAT_DESC.keyword",
				"order": {
					"_key": "asc"
				},
				"size": 50
			},
			"aggs": {
				"deals_count": {
					"cardinality": {
						"field": "content.DOC_DEAL_NO.hash",
						"precision_threshold": "3000"
					}
				}
			}
		},
		"typeOfAgreement": {
			"terms": {
				"field": "content.ACCP_TYP_OF_AGRMNT_DESC_LV2.keyword",
				"order": {
					"_key": "asc"
				},
				"size": 10
			},
			"aggs": {
				"deals_count": {
					"cardinality": {
						"field": "content.DOC_DEAL_NO.hash",
						"precision_threshold": "3000"
					}
				},
				"typeOfAgreement_lv3": {
					"terms": {
						"field": "content.ACCP_TYP_OF_AGRMNT_DESC_LV3.keyword",
						"size": 20
					},
					"aggs": {
						"deals_count": {
							"cardinality": {
								"field": "content.DOC_DEAL_NO.hash",
								"precision_threshold": "3000"
							}
						}
					}
				}
			}
		},
		"dealIds": {
			"terms": {
				"field": "content.DOC_DEAL_NO.keyword",
				"order": {
					"_key": "asc"
				},
				"size": 100
			}
		},
		"dealsCount": {
			"cardinality": {
				"field": "content.DOC_DEAL_NO.hash",
				"precision_threshold": 40000
			}
		},
		"dealCards": {
			"aggs": {
				"deal_document_names": {
					"terms": {
						"field": "content.DOC_DOC_NM.keyword",
						"size": 20
					}
				},
				"deal_representant": {
					"top_hits": {
						"_source": {
							"includes": [
								"content.ACCP_ACCP_PRFT_CNTR_DESC",
								"content.ACCP_ACCP_PRFT_CNTR_DESC_LV2",
								"content.ACCP_BUS_PARTIC_TYP_DESC",
								"content.ACCP_MAIN_BRKR_CTRY",
								"content.ACCP_MAIN_BRKR_LGL_CCAT",
								"content.ACCP_MAIN_CARR_LGL_CCAT",
								"content.ACCP_MAIN_CLI_CCAT",
								"content.ACCP_MAIN_CLI_CTRY",
								"content.ACCP_MAIN_CLI_GRP_LGL_NM",
								"content.ACCP_MAIN_LOB_ID_LST",
								"content.ACCP_TYP_OF_AGRMNT_DESC",
								"content.ACCP_UWRT_OBJ_STAT_DESC",
								"content.ACCP_UWRT_YR",
								"content.DOC_DEAL_NO"
							]
						},
						"size": 1
					}
				}
			},
			"composite": {
				"size": 20,
				"sources": [
					{
						"dealId": {
							"terms": {
								"field": "content.DOC_DEAL_NO"
							}
						}
					}
				],
				"after": {
					"dealId": ""
				}
			}
		}
	}

}

Regarding data, its quite tricky to share (data privacy + appropriate distribution)
image

Profile responses over the index with "eager_global_ordinals": true are having 10MB and 52MB so very hard to share them.

So sharing just examples from one of the shards and one agg.:

profile of simple aggregations - profitCenters
 {
            "type" : "StringTermsAggregatorFromFilters",
            "description" : "profitCenters",
            "time" : "69.7ms",
            "time_in_nanos" : 69730132,
            "breakdown" : {
              "reduce" : 0,
              "post_collection_count" : 1,
              "build_leaf_collector" : 69053246,
              "build_aggregation" : 671619,
              "build_aggregation_count" : 1,
              "build_leaf_collector_count" : 19,
              "post_collection" : 1482,
              "initialize" : 3785,
              "initialize_count" : 1,
              "reduce_count" : 0,
              "collect" : 0,
              "collect_count" : 0
            },
            "debug" : {
              "delegate" : "FilterByFilterAggregator",
              "delegate_debug" : {
                "segments_with_deleted_docs" : 0,
                "filters" : [
                  {
                    "results_from_metadata" : 0,
                    "query" : "content.ACCP_ACCP_PRFT_CNTR_DESC_LV1:Group Finance",
                    "specialized_for" : "term"
                  },
                  {
                    "results_from_metadata" : 0,
                    "query" : "content.ACCP_ACCP_PRFT_CNTR_DESC_LV1:Group Underwriting",
                    "specialized_for" : "term"
                  },
                  {
                    "results_from_metadata" : 0,
                    "query" : "content.ACCP_ACCP_PRFT_CNTR_DESC_LV1:OU Corporate Solutions",
                    "specialized_for" : "term"
                  },
                  {
                    "results_from_metadata" : 0,
                    "query" : "content.ACCP_ACCP_PRFT_CNTR_DESC_LV1:OU Reinsurance",
                    "specialized_for" : "term"
                  }
                ],
                "segments_counted" : 0,
                "segments_with_doc_count_field" : 0,
                "segments_collected" : 19
              },
              "built_buckets" : 1
            },
            "children" : [
              {
                "type" : "GlobalOrdinalsStringTermsAggregator",
                "description" : "profitCenters_lv2",
                "time" : "96.8ms",
                "time_in_nanos" : 96884845,
                "breakdown" : {
                  "reduce" : 0,
                  "post_collection_count" : 1,
                  "build_leaf_collector" : 6939080,
                  "build_aggregation" : 655973,
                  "build_aggregation_count" : 1,
                  "build_leaf_collector_count" : 76,
                  "post_collection" : 637,
                  "initialize" : 2242,
                  "initialize_count" : 1,
                  "reduce_count" : 0,
                  "collect" : 89286913,
                  "collect_count" : 267417
                },
                "debug" : {
                  "segments_with_multi_valued_ords" : 0,
                  "collection_strategy" : "remap using many bucket ords packed using [3/61] bits",
                  "segments_with_single_valued_ords" : 76,
                  "total_buckets" : 8,
                  "built_buckets" : 4,
                  "result_strategy" : "terms",
                  "has_filter" : false
                },
                "children" : [
                  {
                    "type" : "GlobalOrdinalsStringTermsAggregator",
                    "description" : "profitCenters_lv3",
                    "time" : "66.3ms",
                    "time_in_nanos" : 66350266,
                    "breakdown" : {
                      "reduce" : 0,
                      "post_collection_count" : 1,
                      "build_leaf_collector" : 5499671,
                      "build_aggregation" : 646200,
                      "build_aggregation_count" : 1,
                      "build_leaf_collector_count" : 76,
                      "post_collection" : 484,
                      "initialize" : 1327,
                      "initialize_count" : 1,
                      "reduce_count" : 0,
                      "collect" : 60202584,
                      "collect_count" : 267417
                    },
                    "debug" : {
                      "segments_with_multi_valued_ords" : 0,
                      "collection_strategy" : "remap using many bucket ords",
                      "segments_with_single_valued_ords" : 76,
                      "total_buckets" : 45,
                      "built_buckets" : 8,
                      "result_strategy" : "terms",
                      "has_filter" : false
                    },
                    "children" : [
                      {
                        "type" : "GlobalOrdinalsStringTermsAggregator",
                        "description" : "profitCenters_lv4",
                        "time" : "39.1ms",
                        "time_in_nanos" : 39197952,
                        "breakdown" : {
                          "reduce" : 0,
                          "post_collection_count" : 1,
                          "build_leaf_collector" : 3875097,
                          "build_aggregation" : 608910,
                          "build_aggregation_count" : 1,
                          "build_leaf_collector_count" : 76,
                          "post_collection" : 339,
                          "initialize" : 679,
                          "initialize_count" : 1,
                          "reduce_count" : 0,
                          "collect" : 34712927,
                          "collect_count" : 267417
                        },
                        "debug" : {
                          "segments_with_multi_valued_ords" : 0,
                          "collection_strategy" : "remap using many bucket ords",
                          "segments_with_single_valued_ords" : 76,
                          "total_buckets" : 120,
                          "built_buckets" : 45,
                          "result_strategy" : "terms",
                          "has_filter" : false
                        },
                        "children" : [
                          {
                            "type" : "GlobalOrdinalsStringTermsAggregator",
                            "description" : "profitCenters_lv5",
                            "time" : "23.9ms",
                            "time_in_nanos" : 23943806,
                            "breakdown" : {
                              "reduce" : 0,
                              "post_collection_count" : 1,
                              "build_leaf_collector" : 2066387,
                              "build_aggregation" : 471748,
                              "build_aggregation_count" : 1,
                              "build_leaf_collector_count" : 76,
                              "post_collection" : 153,
                              "initialize" : 87,
                              "initialize_count" : 1,
                              "reduce_count" : 0,
                              "collect" : 21405431,
                              "collect_count" : 267370
                            },
                            "debug" : {
                              "segments_with_multi_valued_ords" : 0,
                              "collection_strategy" : "remap using many bucket ords",
                              "segments_with_single_valued_ords" : 76,
                              "total_buckets" : 366,
                              "built_buckets" : 120,
                              "result_strategy" : "terms",
                              "has_filter" : false
                            }
                          }
                        ]
                      }
                    ]
                  }
                ]
              }
            ]
          },

I have also the script averaging the response times for all shards:

profile of sub-aggregations - profitCenters
          {
            "type" : "StringTermsAggregatorFromFilters",
            "description" : "profitCenters",
            "time" : "206.5ms",
            "time_in_nanos" : 206591154,
            "breakdown" : {
              "reduce" : 0,
              "post_collection_count" : 1,
              "build_leaf_collector" : 163919642,
              "build_aggregation" : 42643488,
              "build_aggregation_count" : 1,
              "build_leaf_collector_count" : 13,
              "post_collection" : 11354,
              "initialize" : 16670,
              "initialize_count" : 1,
              "reduce_count" : 0,
              "collect" : 0,
              "collect_count" : 0
            },
            "debug" : {
              "delegate" : "FilterByFilterAggregator",
              "delegate_debug" : {
                "segments_with_deleted_docs" : 0,
                "filters" : [
                  {
                    "results_from_metadata" : 0,
                    "query" : "content.ACCP_ACCP_PRFT_CNTR_DESC_LV1:Group Finance",
                    "specialized_for" : "term"
                  },
                  {
                    "results_from_metadata" : 0,
                    "query" : "content.ACCP_ACCP_PRFT_CNTR_DESC_LV1:Group Underwriting",
                    "specialized_for" : "term"
                  },
                  {
                    "results_from_metadata" : 0,
                    "query" : "content.ACCP_ACCP_PRFT_CNTR_DESC_LV1:OU Corporate Solutions",
                    "specialized_for" : "term"
                  },
                  {
                    "results_from_metadata" : 0,
                    "query" : "content.ACCP_ACCP_PRFT_CNTR_DESC_LV1:OU Reinsurance",
                    "specialized_for" : "term"
                  }
                ],
                "segments_counted" : 0,
                "segments_with_doc_count_field" : 0,
                "segments_collected" : 13
              },
              "built_buckets" : 1
            },
            "children" : [
              {
                "type" : "CardinalityAggregator",
                "description" : "deals_count",
                "time" : "19.3ms",
                "time_in_nanos" : 19338003,
                "breakdown" : {
                  "reduce" : 0,
                  "post_collection_count" : 1,
                  "build_leaf_collector" : 57628,
                  "build_aggregation" : 205439,
                  "build_aggregation_count" : 1,
                  "build_leaf_collector_count" : 52,
                  "post_collection" : 1177,
                  "initialize" : 97,
                  "initialize_count" : 1,
                  "reduce_count" : 0,
                  "collect" : 19073662,
                  "collect_count" : 267706
                },
                "debug" : {
                  "ordinals_collectors_used" : 0,
                  "ordinals_collectors_overhead_too_high" : 0,
                  "built_buckets" : 4,
                  "string_hashing_collectors_used" : 0,
                  "numeric_collectors_used" : 52,
                  "empty_collectors_used" : 0
                }
              },
              {
                "type" : "GlobalOrdinalsStringTermsAggregator",
                "description" : "profitCenters_lv2",
                "time" : "258.7ms",
                "time_in_nanos" : 258703065,
                "breakdown" : {
                  "reduce" : 0,
                  "post_collection_count" : 1,
                  "build_leaf_collector" : 3550652,
                  "build_aggregation" : 42417699,
                  "build_aggregation_count" : 1,
                  "build_leaf_collector_count" : 52,
                  "post_collection" : 7574,
                  "initialize" : 10993,
                  "initialize_count" : 1,
                  "reduce_count" : 0,
                  "collect" : 212716147,
                  "collect_count" : 267706
                },
                "debug" : {
                  "segments_with_multi_valued_ords" : 0,
                  "collection_strategy" : "remap using many bucket ords packed using [3/61] bits",
                  "segments_with_single_valued_ords" : 52,
                  "total_buckets" : 8,
                  "built_buckets" : 4,
                  "result_strategy" : "terms",
                  "has_filter" : false
                },
                "children" : [
                  {
                    "type" : "CardinalityAggregator",
                    "description" : "deals_count",
                    "time" : "17.9ms",
                    "time_in_nanos" : 17985976,
                    "breakdown" : {
                      "reduce" : 0,
                      "post_collection_count" : 1,
                      "build_leaf_collector" : 24617,
                      "build_aggregation" : 515266,
                      "build_aggregation_count" : 1,
                      "build_leaf_collector_count" : 52,
                      "post_collection" : 275,
                      "initialize" : 128,
                      "initialize_count" : 1,
                      "reduce_count" : 0,
                      "collect" : 17445690,
                      "collect_count" : 267706
                    },
                    "debug" : {
                      "ordinals_collectors_used" : 0,
                      "ordinals_collectors_overhead_too_high" : 0,
                      "built_buckets" : 8,
                      "string_hashing_collectors_used" : 0,
                      "numeric_collectors_used" : 52,
                      "empty_collectors_used" : 0
                    }
                  },
                  {
                    "type" : "GlobalOrdinalsStringTermsAggregator",
                    "description" : "profitCenters_lv3",
                    "time" : "202.5ms",
                    "time_in_nanos" : 202591469,
                    "breakdown" : {
                      "reduce" : 0,
                      "post_collection_count" : 1,
                      "build_leaf_collector" : 2862613,
                      "build_aggregation" : 41875454,
                      "build_aggregation_count" : 1,
                      "build_leaf_collector_count" : 52,
                      "post_collection" : 6537,
                      "initialize" : 8985,
                      "initialize_count" : 1,
                      "reduce_count" : 0,
                      "collect" : 157837880,
                      "collect_count" : 267706
                    },
                    "debug" : {
                      "segments_with_multi_valued_ords" : 0,
                      "collection_strategy" : "remap using many bucket ords",
                      "segments_with_single_valued_ords" : 52,
                      "total_buckets" : 50,
                      "built_buckets" : 8,
                      "result_strategy" : "terms",
                      "has_filter" : false
                    },
                    "children" : [
                      {
                        "type" : "CardinalityAggregator",
                        "description" : "deals_count",
                        "time" : "29.8ms",
                        "time_in_nanos" : 29870095,
                        "breakdown" : {
                          "reduce" : 0,
                          "post_collection_count" : 1,
                          "build_leaf_collector" : 20917,
                          "build_aggregation" : 2839577,
                          "build_aggregation_count" : 1,
                          "build_leaf_collector_count" : 52,
                          "post_collection" : 240,
                          "initialize" : 188,
                          "initialize_count" : 1,
                          "reduce_count" : 0,
                          "collect" : 27009173,
                          "collect_count" : 267706
                        },
                        "debug" : {
                          "ordinals_collectors_used" : 0,
                          "ordinals_collectors_overhead_too_high" : 0,
                          "built_buckets" : 50,
                          "string_hashing_collectors_used" : 0,
                          "numeric_collectors_used" : 52,
                          "empty_collectors_used" : 0
                        }
                      },
                      {
                        "type" : "GlobalOrdinalsStringTermsAggregator",
                        "description" : "profitCenters_lv4",
                        "time" : "131.5ms",
                        "time_in_nanos" : 131575247,
                        "breakdown" : {
                          "reduce" : 0,
                          "post_collection_count" : 1,
                          "build_leaf_collector" : 2061481,
                          "build_aggregation" : 38984315,
                          "build_aggregation_count" : 1,
                          "build_leaf_collector_count" : 52,
                          "post_collection" : 5327,
                          "initialize" : 7041,
                          "initialize_count" : 1,
                          "reduce_count" : 0,
                          "collect" : 90517083,
                          "collect_count" : 267706
                        },
                        "debug" : {
                          "segments_with_multi_valued_ords" : 0,
                          "collection_strategy" : "remap using many bucket ords",
                          "segments_with_single_valued_ords" : 52,
                          "total_buckets" : 129,
                          "built_buckets" : 50,
                          "result_strategy" : "terms",
                          "has_filter" : false
                        },
                        "children" : [
                          {
                            "type" : "CardinalityAggregator",
                            "description" : "deals_count",
                            "time" : "34.2ms",
                            "time_in_nanos" : 34267921,
                            "breakdown" : {
                              "reduce" : 0,
                              "post_collection_count" : 1,
                              "build_leaf_collector" : 20036,
                              "build_aggregation" : 5622350,
                              "build_aggregation_count" : 1,
                              "build_leaf_collector_count" : 52,
                              "post_collection" : 128,
                              "initialize" : 121,
                              "initialize_count" : 1,
                              "reduce_count" : 0,
                              "collect" : 28625286,
                              "collect_count" : 267666
                            },
                            "debug" : {
                              "ordinals_collectors_used" : 0,
                              "ordinals_collectors_overhead_too_high" : 0,
                              "built_buckets" : 129,
                              "string_hashing_collectors_used" : 0,
                              "numeric_collectors_used" : 52,
                              "empty_collectors_used" : 0
                            }
                          },
                          {
                            "type" : "GlobalOrdinalsStringTermsAggregator",
                            "description" : "profitCenters_lv5",
                            "time" : "69.7ms",
                            "time_in_nanos" : 69770848,
                            "breakdown" : {
                              "reduce" : 0,
                              "post_collection_count" : 1,
                              "build_leaf_collector" : 1135485,
                              "build_aggregation" : 33194860,
                              "build_aggregation_count" : 1,
                              "build_leaf_collector_count" : 52,
                              "post_collection" : 4525,
                              "initialize" : 4702,
                              "initialize_count" : 1,
                              "reduce_count" : 0,
                              "collect" : 35431276,
                              "collect_count" : 267666
                            },
                            "debug" : {
                              "segments_with_multi_valued_ords" : 0,
                              "collection_strategy" : "remap using many bucket ords",
                              "segments_with_single_valued_ords" : 52,
                              "deferred_aggregators" : [
                                "deals_count"
                              ],
                              "total_buckets" : 379,
                              "built_buckets" : 129,
                              "result_strategy" : "terms",
                              "has_filter" : false
                            },
                            "children" : [
                              {
                                "type" : "CardinalityAggregator",
                                "description" : "deals_count",
                                "time" : "34.4ms",
                                "time_in_nanos" : 34478860,
                                "breakdown" : {
                                  "reduce" : 0,
                                  "post_collection_count" : 1,
                                  "build_leaf_collector" : 51754,
                                  "build_aggregation" : 11399812,
                                  "build_aggregation_count" : 1,
                                  "build_leaf_collector_count" : 33,
                                  "post_collection" : 446,
                                  "initialize" : 150,
                                  "initialize_count" : 1,
                                  "reduce_count" : 0,
                                  "collect" : 23026698,
                                  "collect_count" : 267666
                                },
                                "debug" : {
                                  "ordinals_collectors_used" : 0,
                                  "ordinals_collectors_overhead_too_high" : 0,
                                  "built_buckets" : 379,
                                  "string_hashing_collectors_used" : 0,
                                  "numeric_collectors_used" : 33,
                                  "empty_collectors_used" : 0
                                }
                              }
                            ]
                          }
                        ]
                      }
                    ]
                  }
                ]
              }
            ]
          },

I also have found, that no caching was used after whole round od testing - despite of "size":0 and forced flag: ?request_cache=true

image

Looking at the profile of your sub-aggregations, it looks like the majority of your time is spent collecting. While I'm not too familiar with this area, and someone with more in-depth Elasticsearch experience might have more input here, I assume this refers to the actual collection of the data required to perform the aggregation.

There are a few things I can think of here that might give you some benefit:

  1. Look at your data nodes, do they have high disk IO?
    • If they have high disk IO, do they also have high latency?
    • These 2 issues might play a role in the high collect duration.
  2. Take a look at Collect Mode, this might improve your performance if your if your sub-aggregations are running against everything, rather than just the results of your main aggregation.
  3. Look into ways for your query to be able to leverage caching. If your data is mainly static then caching results would possibly help here.
    • Note: I don't really know much about caching, so I'm of little help here from a technical perspective.
  4. Look into upgrading Elasticsearch to the latest 8.x version. There have been significant improvements in many places that could benefit you.

with query using "collect_mode": "breadth_first" firstly no change, after few hits, looks that cache started to work, so response times now oscillating between 1.5s and 2.2s.

# monitor cache per index
GET _stats/request_cache?human

image

problematic sub-aggregations using "breath-first"
GET treaty_v4/_search
{
  "size": 0, 
	"aggs": {
		"partners": {
			"terms": {
				"field": "content.ACCP_MAIN_CLI_CCAT.keyword",
				"collect_mode": "breadth_first",
				"order": {
					"_key": "asc"
				},
				"size": 500
			},
			"aggs": {
				"deals_count": {
					"cardinality": {
						"field": "content.DOC_DEAL_NO.hash",
						"precision_threshold": "3000"
					}
				}
			}
		},
		"profitCenters": {
			"terms": {
				"field": "content.ACCP_ACCP_PRFT_CNTR_DESC_LV1",
				"collect_mode": "breadth_first",
				"size": 10
			},
			"aggs": {
				"deals_count": {
					"cardinality": {
						"field": "content.DOC_DEAL_NO.hash",
						"precision_threshold": "3000"
					}
				},
				"profitCenters_lv2": {
					"terms": {
						"field": "content.ACCP_ACCP_PRFT_CNTR_DESC_LV2",
						"collect_mode": "breadth_first",
						"size": 20
					},
					"aggs": {
						"deals_count": {
							"cardinality": {
								"field": "content.DOC_DEAL_NO.hash",
								"precision_threshold": "3000"
							}
						},
						"profitCenters_lv3": {
							"terms": {
								"field": "content.ACCP_ACCP_PRFT_CNTR_DESC_LV3",
								"collect_mode": "breadth_first",
								"size": 100
							},
							"aggs": {
								"deals_count": {
									"cardinality": {
										"field": "content.DOC_DEAL_NO.hash",
										"precision_threshold": "3000"
									}
								},
								"profitCenters_lv4": {
									"terms": {
										"field": "content.ACCP_ACCP_PRFT_CNTR_DESC_LV4",
										"collect_mode": "breadth_first",
										"size": 100
									},
									"aggs": {
										"deals_count": {
											"cardinality": {
												"field": "content.DOC_DEAL_NO.hash",
												"precision_threshold": "3000"
											}
										},
										"profitCenters_lv5": {
											"terms": {
												"field": "content.ACCP_ACCP_PRFT_CNTR_DESC_LV5",
												"size": 100
											},
											"aggs": {
												"deals_count": {
													"cardinality": {
														"field": "content.DOC_DEAL_NO.hash",
														"precision_threshold": "3000"
													}
												}
											}
										}
									}
								}
							}
						}
					}
				}
			}
		},
		"lobIdList": {
			"terms": {
				"field": "content.ACCP_MAIN_LOB_ID_LST.keyword",
				"collect_mode": "breadth_first",
				"order": {
					"_key": "asc"
				},
				"size": 500
			},
			"aggs": {
				"deals_count": {
					"cardinality": {
						"field": "content.DOC_DEAL_NO.hash",
						"precision_threshold": "3000"
					}
				}
			}
		},
		"clientCountry": {
			"terms": {
				"field": "content.ACCP_MAIN_CLI_CTRY.keyword",
				"collect_mode": "breadth_first",
				"order": {
					"_key": "asc"
				},
				"size": 1000
			},
			"aggs": {
				"deals_count": {
					"cardinality": {
						"field": "content.DOC_DEAL_NO.hash",
						"precision_threshold": "3000"
					}
				}
			}
		},
		"typeOfBusiness": {
			"terms": {
				"field": "content.ACCP_BUS_PARTIC_TYP_DESC_LV1",
				"collect_mode": "breadth_first",
				"size": 10
			},
			"aggs": {
				"deals_count": {
					"cardinality": {
						"field": "content.DOC_DEAL_NO.hash",
						"precision_threshold": "3000"
					}
				},
				"typeOfBusiness_lv2": {
					"terms": {
						"field": "content.ACCP_BUS_PARTIC_TYP_DESC_LV2",
						"collect_mode": "breadth_first",
						"size": 20
					},
					"aggs": {
						"deals_count": {
							"cardinality": {
								"field": "content.DOC_DEAL_NO.hash",
								"precision_threshold": "3000"
							}
						},
						"typeOfBusiness_lv3": {
							"terms": {
								"field": "content.ACCP_BUS_PARTIC_TYP_DESC_LV3",
								"size": 20
							},
							"aggs": {
								"deals_count": {
									"cardinality": {
										"field": "content.DOC_DEAL_NO.hash",
										"precision_threshold": "3000"
									}
								}
							}
						}
					}
				}
			}
		},
		"contentType": {
			"terms": {
				"field": "content.DOC_CNTNT_CLSFCN.keyword",
				"collect_mode": "breadth_first",
				"order": {
					"_key": "asc"
				},
				"size": 20
			}
		},
		"clientGroup": {
			"terms": {
				"field": "content.ACCP_MAIN_CLI_GRP_LGL_NM.keyword",
				"collect_mode": "breadth_first",
				"order": {
					"_key": "asc"
				},
				"size": 1000
			},
			"aggs": {
				"deals_count": {
					"cardinality": {
						"field": "content.DOC_DEAL_NO.hash",
						"precision_threshold": "3000"
					}
				}
			}
		},
		"reinsurer": {
			"terms": {
				"field": "content.ACCP_MAIN_CARR_LGL_CCAT.keyword",
				"collect_mode": "breadth_first",
				"order": {
					"_key": "asc"
				},
				"size": 250
			},
			"aggs": {
				"deals_count": {
					"cardinality": {
						"field": "content.DOC_DEAL_NO.hash",
						"precision_threshold": "3000"
					}
				}
			}
		},
		"broker": {
			"terms": {
				"field": "content.ACCP_MAIN_BRKR_LGL_CCAT.keyword",
				"collect_mode": "breadth_first",
				"order": {
					"_key": "asc"
				},
				"size": 1000
			},
			"aggs": {
				"deals_count": {
					"cardinality": {
						"field": "content.DOC_DEAL_NO.hash",
						"precision_threshold": "3000"
					}
				}
			}
		},
		"brokerCountry": {
			"terms": {
				"field": "content.ACCP_MAIN_BRKR_CTRY.keyword",
				"collect_mode": "breadth_first",
				"order": {
					"_key": "asc"
				},
				"size": 150
			},
			"aggs": {
				"deals_count": {
					"cardinality": {
						"field": "content.DOC_DEAL_NO.hash",
						"precision_threshold": "3000"
					}
				}
			}
		},
		"division": {
			"terms": {
				"field": "content.ACCP_ACCP_PRFT_CNTR_DESC_LV2",
				"collect_mode": "breadth_first",
				"order": {
					"_key": "asc"
				},
				"size": 50
			},
			"aggs": {
				"deals_count": {
					"cardinality": {
						"field": "content.DOC_DEAL_NO.hash",
						"precision_threshold": "3000"
					}
				}
			}
		},
		"uwrtYear": {
			"terms": {
				"field": "content.ACCP_UWRT_YR.keyword",
				"collect_mode": "breadth_first",
				"order": {
					"_key": "desc"
				},
				"size": 150
			},
			"aggs": {
				"deals_count": {
					"cardinality": {
						"field": "content.DOC_DEAL_NO.hash",
						"precision_threshold": "3000"
					}
				}
			}
		},
		"uwrtStatus": {
			"terms": {
				"field": "content.ACCP_UWRT_OBJ_STAT_DESC.keyword",
				"collect_mode": "breadth_first",
				"order": {
					"_key": "asc"
				},
				"size": 50
			},
			"aggs": {
				"deals_count": {
					"cardinality": {
						"field": "content.DOC_DEAL_NO.hash",
						"precision_threshold": "3000"
					}
				}
			}
		},
		"typeOfAgreement": {
			"terms": {
				"field": "content.ACCP_TYP_OF_AGRMNT_DESC_LV2.keyword",
				"collect_mode": "breadth_first",
				"order": {
					"_key": "asc"
				},
				"size": 10
			},
			"aggs": {
				"deals_count": {
					"cardinality": {
						"field": "content.DOC_DEAL_NO.hash",
						"precision_threshold": "3000"
					}
				},
				"typeOfAgreement_lv3": {
					"terms": {
						"field": "content.ACCP_TYP_OF_AGRMNT_DESC_LV3.keyword",
						"collect_mode": "breadth_first",
						"size": 20
					},
					"aggs": {
						"deals_count": {
							"cardinality": {
								"field": "content.DOC_DEAL_NO.hash",
								"precision_threshold": "3000"
							}
						}
					}
				}
			}
		},
		"dealIds": {
			"terms": {
				"field": "content.DOC_DEAL_NO.keyword",
				"order": {
					"_key": "asc"
				},
				"size": 100
			}
		},
		"dealsCount": {
			"cardinality": {
				"field": "content.DOC_DEAL_NO.hash",
				"precision_threshold": 40000
			}
		},
		"dealCards": {
			"aggs": {
				"deal_document_names": {
					"terms": {
						"field": "content.DOC_DOC_NM.keyword",
						"size": 20
					}
				},
				"deal_representant": {
					"top_hits": {
						"_source": {
							"includes": [
								"content.ACCP_ACCP_PRFT_CNTR_DESC",
								"content.ACCP_ACCP_PRFT_CNTR_DESC_LV2",
								"content.ACCP_BUS_PARTIC_TYP_DESC",
								"content.ACCP_MAIN_BRKR_CTRY",
								"content.ACCP_MAIN_BRKR_LGL_CCAT",
								"content.ACCP_MAIN_CARR_LGL_CCAT",
								"content.ACCP_MAIN_CLI_CCAT",
								"content.ACCP_MAIN_CLI_CTRY",
								"content.ACCP_MAIN_CLI_GRP_LGL_NM",
								"content.ACCP_MAIN_LOB_ID_LST",
								"content.ACCP_TYP_OF_AGRMNT_DESC",
								"content.ACCP_UWRT_OBJ_STAT_DESC",
								"content.ACCP_UWRT_YR",
								"content.DOC_DEAL_NO"
							]
						},
						"size": 1
					}
				}
			},
			"composite": {
				"size": 20,
				"sources": [
					{
						"dealId": {
							"terms": {
								"field": "content.DOC_DEAL_NO"
							}
						}
					}
				],
				"after": {
					"dealId": ""
				}
			}
		}
	}
}

I/O operations while search (one of data nodes) - the observed peak is correlating with time, when cache started working. When later hitting same aggregation again and again, no such behaviour observed.

Anyway, thanks for this, I was not aware of this collect-mode functionality. I'll try to understand more in detail this and other options you wrote.
If anyone else has some more experience regarding profiling, I would also appreciate.

In ideal world, I would like to find solution, where I can have full "precision_threshold":40000 for cardinality sub-aggregations and overall response time <1s. (which is not the case, now query set to "precision_threshold":3000 and still 1.6-2.3s response time.

Could you provide what type of storage you're using? The IO Rate is fairly low, but your Latency is fairly high.

Hi @BenB196,
we have StandardSSD_LRS from disk.csi.azure.com, all under Kubernetes in Azure, here should be full specs: Standard_D16s_v5 (azureprice.net)

Is there something you would tweak on OS level? (might be an issue to change it now if it requires recreation - since we have ~7+TB of data stored in the cluster)

Elasticsearch can be very IO intensive and use a lot of small random IO operations. I had a look at the performance characteristics for StandardSSD_LRS and it says:

Summary
Designed to provide consistent performance for low IOPS workloads. Delivers better availability and latency compared to HDD Disks.
Workload
Web servers, low IOPS application servers, lightly used enterprise applications, and Dev/Test
Max IOPS
Up to 500 IOPS

That does not sound suitable for an IO intensive application like Elasticsearch. If you have a lot of data stored in the cluster so it does not fit in the OS page cache I would not be surprised if this is the bottleneck. I would recommend you switch to Premium SSDs and see what difference that makes.

2 Likes

Hi Christian,
thanks for noting that and participate here. Very appreciated.

So would you select max. P50 config? (I directly don't see diff with P40 now, will check).
It might be related, but from what I understood we have more issue with latency rather than I/O.
MicrosoftTeams-image (4)

I do not know how many IOPS your use case requires if that turns out to be the bottleneck. I am also not very familiar with Azure storage, but would recommend you check disk IO and await on the nodes and then improve the storage and see what impact it makes. I think you will need to test and see.

If available in your region, Performance SSD v2 might be a better/simpler choice over Performance SSD (but it doesn't appear to be in all regions).

1 Like

Thank you Ben, Christian.

If I would like to prove, that faster disks are right move for my case (heavy aggregation with cardinality sub-aggregation) - do I have other options how to make sure before?

I was just executing heavy query from Kibana DevTools (hitting up to 2.5s "took" time for single query) and observing the I/O, latency chart on randomly selected data node also in Kibana.
And also I was checking "profile" API but just informatively, results are different for several aggregations (as can be seen above).
It looks, that our current disks should support 600 IOPS, which looks we are not hitting.

Thanks

Run iostat -x -d 1 while you are querying and see what is reported. Do you see long latencies or high await in the stats? Does disk utilisation increase? As you have a large number of data nodes you may need to run this on quite a few as all nodes not necessarily will be involved in all queries.

Thanks Christian.
This exact command we unfortunately cannot run in our current storage machines, but will try faster disks and see if there is any visible improvement.

Just out of curiosity, I have rewritten and executed the same query (with 17 aggregations every with cardinality sub-aggregation) to the query with simple terms sub-aggregation (10 buckets returned).
The response time is even worse: 5.1s (compared to best run with cardinality sub-agg = 2.5s).

So it turns out, that it is matter of any sub-aggregation I need to optimize (no matter if cardinality or terms). Using "collect_mode": "breadth_first" and murmur3 plugin for inner sub-agg.

I have found some kind of solution (so far not fully fitting to our design, but working well).
It's _msearch endpoint, documentation here:

With aggregations rewritten each as separate element, I was able to get 715ms response. This is huge improvement, because aggregations can be executed in-parallel (at least what I understood).

GET treaty_v4/_msearch
{}
{"size":0,"aggregations":{"agg":{"terms": {"field": "content.ACCP_MAIN_CLI_CCAT.keyword", "order": {"_key": "asc"}, "size": 500}, "aggs": {"deals_count": {"cardinality": {"field": "content.DOC_DEAL_NO.hash"}}}}}}
{}
{"size":0,"aggregations":{"agg":{"terms": {"field": "content.ACCP_ACCP_PRFT_CNTR_DESC_LV1", "size": 10}, "aggs": {"deals_count": {"cardinality": {"field": "content.DOC_DEAL_NO.hash"}}, "profitCenters_lv2": {"terms": {"field": "content.ACCP_ACCP_PRFT_CNTR_DESC_LV2", "size": 20}, "aggs": {"deals_count": {"cardinality": {"field": "content.DOC_DEAL_NO.hash"}}, "profitCenters_lv3": {"terms": {"field": "content.ACCP_ACCP_PRFT_CNTR_DESC_LV3", "size": 100}, "aggs": {"deals_count": {"cardinality": {"field": "content.DOC_DEAL_NO.hash"}}, "profitCenters_lv4": {"terms": {"field": "content.ACCP_ACCP_PRFT_CNTR_DESC_LV4", "size": 100}, "aggs": {"deals_count": {"cardinality": {"field": "content.DOC_DEAL_NO.hash"}}, "profitCenters_lv5": {"terms": {"field": "content.ACCP_ACCP_PRFT_CNTR_DESC_LV5", "size": 100}, "aggs": {"deals_count": {"cardinality": {"field": "content.DOC_DEAL_NO.hash"}}}}}}}}}}}}}}
... etc ...