Optimize Global Aggregation that became slower after increase of terms subaggregations count

We have and index with 500.000 documents ca with 3 shards and 3 replicas
This is the mapping

{
	"mappings": {
		"_doc": {
			"dynamic_templates": [{
					"attr_template": {
						"mapping": {
							"type": "text",
							"index_options": "freqs",
							"fields": {
								"facet": {
									"index": true,
									"store": true,
									"type": "keyword",
									"eager_global_ordinals": true
								}
							}
						},
						"path_match": "attributes.*"
					}
				},
				{
					"num_attr_template": {
						"mapping": {
							"type": "float"
						},
						"path_match": "numerics.*"
					}
				}
			],
			"properties": {
				"brand": {
					"properties": {
						"facet": {
							"type": "keyword",
							"eager_global_ordinals": true,
							"index": true,
							"store": true
						}
					},
					"category_named_path": {
						"type": "text",
						"norms": false,
						"index_options": "freqs",
						"analyzer": "path-analyzer",
						"search_analyzer": "keyword",
						"fielddata": true
					},
					"everytext": {
						"type": "text",
						"index": true,
						"analyzer": "language_index_analyzer",
						"search_analyzer": "language_search_analyzer",
						"fields": {
							"stopwords": {
								"type": "text",
								"index": true,
								"analyzer": "stopwords_index_analyzer",
								"search_analyzer": "stopwords_index_analyzer"
							}
						}
					},
					"numerics": {
						"properties": {
							"a_numeric": {
								"type": "float"
							}
						}
					},
					"price": {
						"properties": {
							"current": {
								"type": "float",
								"store": true
							}
						}
					}
				}
			}
		},
		"settings": {
			"index": {
				"analysis": {
					"analyzer": {
						"justlowercased": {
							"filter": [
								"lowercase"
							],
							"tokenizer": "keyword"
						},
						"path-analyzer": {
							"type": "custom",
							"tokenizer": "path-tokenizer"
						},
						"language_index_analyzer": {
							"filter": [
								"language_stop",
								"word_delimeter_filter",
								"icu_folding",
								"language_stemmer"
							],
							"tokenizer": "standard"
						},
						"stopwords_index_analyzer": {
							"filter": [
								"language_stop"
							],
							"tokenizer": "standard"
						},
						"language_search_analyzer": {
							"filter": [
								"language_stop",
								"word_delimeter_filter",
								"icu_folding",
								"language_stemmer"
							],
							"tokenizer": "standard"
					},
					"filter": {
						"language_stop": {
							"type": "stop",
							"stopwords": "english"
						},
						"word_delimeter_filter": {
							"type": "word_delimiter",
							"preserve_original": "true",
							"split_on_numerics": "true",
							"catenate_words": "true",
							"generate_number_parts": "true",
							"generate_word_parts": "true"
						},
						"language_stemmer": {
							"type": "stemmer",
							"language": "light_english"
						}
					},
					"tokenizer": {
						"path-tokenizer": {
							"delimiter": ".",
							"type": "path_hierarchy"
						}
					}
				},
				"max_result_window": "650000",
				"number_of_replicas": 3,
				"number_of_shards": 3,
				"refresh_interval": "5s"
			}
		}
	}
}

The basic aggregation query is the following:

{
	"aggregations": {
		"all": {
			"aggregations": {
				"querystring": {
					"aggregations": {
						"attribute1": {
							"aggregations": {
								"field": {
									"terms": {
										"field": "attributes.attribute1.facet",
										"size": 1000
									}
								}
							},
							"filter": {
								"bool": {}
							}
						},
						....
						"attribute88": {
							"aggregations": {
								"field": {
									"terms": {
										"field": "attributes.attribute188.facet",
										"size": 1000
									}
								}
							},
							"filter": {
								"bool": {}
							}
						},
						"brand": {
							"aggregations": {
								"field": {
									"terms": {
										"field": "brand.facet",
										"size": 1000
									}
								}
							},
							"filter": {
								"bool": {}
							}
						}
					},
					"filter": {
						"bool": {
							"must": {
								"query_string": {
									"fields": ["everytext"],
									"query": "(this AND that)"
								}
							}
						}
					}
				}
			},
			"global": {}
		},
		"category": {
			"aggregations": {
				"field": {
					"terms": {
						"exclude": ".*\\\\..*\\\\..*\\\\..*\\\\..*\\\\..*\\\\..*",
						"field": "category_named_path",
						"size": 10000
					}
				}
			},
			"filter": {
				"bool": {
					"must": {
						"query_string": {
							"fields": ["everytext"],
							"query": "(this AND that)"
						}
					}
				}
			}
		},
		"count": {
			"aggregations": {
				"count": {
					"value_count": {
						"field": "_id"
					}
				}
			},
			"filter": {
				"bool": {
					"must": {
						"query_string": {
							"fields": ["everytext"],
							"query": "(this AND that)"
						}
					}
				}
			}
		},
		"discount": {
			"aggregations": {
				"field": {
					"stats": {
						"field": "numerics.discount"
					}
				}
			},
			"filter": {
				"bool": {
					"must": {
						"query_string": {
							"fields": ["everytext"],
							"query": "(this AND that)"
						}
					}
				}
			}
		},
		"price": {
			"aggregations": {
				"field": {
					"stats": {
						"field": "price.current"
					}
				}
			},
			"filter": {
				"bool": {
					"must": {
						"query_string": {
							"fields": ["everytext"],
							"query": "(this AND that)"
						}
					}
				}
			}
		}
	},
	"from": 0,
	"size": 0
}

We have for every attribute aggregation a filter. On top of the shared query part this filter will be built according to the attribute in the UI filtered by user with the following logic: apply every filter but the one related to the specific attribute. ie: attribute1="attribute1 value", attribute2="attribute2 value", price > 10.0 will produce:

{
	"aggregations": {
		"all": {
			"aggregations": {
				"querystring": {
					"aggregations": {
						"attribute1": {
							"aggregations": {
								"field": {
									"terms": {
										"field": "attributes.attribute1.facet",
										"size": 1000
									}
								}
							},
							"filter": {
								"bool": {
									"must": [{
											"terms": {
												"attributes.attribute2.facet": ["attribute2 value"]
											}
										},
										{
											"range": {
												"price.current": {
													"gt": 10
												}
											}
										}
									]
								}
							}
						},
						"attribute2": {
							"aggregations": {
								"field": {
									"terms": {
										"field": "attributes.attribute2.facet",
										"size": 1000
									}
								}
							},
							"filter": {
								"bool": {
									"must": [{
											"terms": {
												"attributes.attribute1.facet": ["attribute1 value"]
											}
										},
										{
											"range": {
												"price.current": {
													"gt": 10
												}
											}
										}
									]
								}
							}
						},
						....
						"brand": {
							"aggregations": {
								"field": {
									"terms": {
										"field": "brand.facet",
										"size": 1000
									}
								}
							},
							"filter": {
								"bool": {
									"must": [{
											"terms": {
												"attributes.attribute1.facet": ["attribute1 value"]
											}
										},
										{
											"terms": {
												"attributes.attribute2.facet": ["attribute2 value"]
											}
										},
										{
											"range": {
												"price.current": {
													"gt": 10
												}
											}
										}
									]
								}
							}
						}
					},
					"filter": {
						"bool": {
							"must": {
								"query_string": {
									"fields": ["everytext"],
									"query": "(this AND that)"
								}
							}
						}
					}
				}
			},
			"global": {}
		},
		"category": {
			"aggregations": {
				"field": {
					"terms": {
						"exclude": ".*\\\\..*\\\\..*\\\\..*\\\\..*\\\\..*\\\\..*",
						"field": "category_named_path",
						"size": 10000
					}
				}
			},
			"filter": {
				"bool": {
					"must": [{
							"query_string": {
								"fields": ["everytext"],
								"query": "(this AND that)"
							}
						}, {
							"terms": {
								"attributes.attribute1.facet": ["attribute1 value"]
							}
						},
						{
							"terms": {
								"attributes.attribute2.facet": ["attribute2 value"]
							}
						},
						{
							"range": {
								"price.current": {
									"gt": 10
								}
							}
						}
					]
				}
			}
		},
		"count": {
			"aggregations": {
				"count": {
					"value_count": {
						"field": "_id"
					}
				}
			},
			"filter": {
				"bool": {
					"must": [{
							"query_string": {
								"fields": ["everytext"],
								"query": "(this AND that)"
							}
						}, {
							"terms": {
								"attributes.attribute1.facet": ["attribute1 value"]
							}
						},
						{
							"terms": {
								"attributes.attribute2.facet": ["attribute2 value"]
							}
						},
						{
							"range": {
								"price.current": {
									"gt": 10
								}
							}
						}
					]
				}
			}
		},
		"discount": {
			"aggregations": {
				"field": {
					"stats": {
						"field": "numerics.discount"
					}
				}
			},
			"filter": {
				"bool": {
					"must": [{
							"query_string": {
								"fields": ["everytext"],
								"query": "(this AND that)"
							}
						}, {
							"terms": {
								"attributes.attribute1.facet": ["attribute1 value"]
							}
						},
						{
							"terms": {
								"attributes.attribute2.facet": ["attribute2 value"]
							}
						},
						{
							"range": {
								"price.current": {
									"gt": 10
								}
							}
						}
					]
				}
			}
		},
		"price": {
			"aggregations": {
				"field": {
					"stats": {
						"field": "price.current"
					}
				}
			},
			"filter": {
				"bool": {
					"must": [{
							"query_string": {
								"fields": ["everytext"],
								"query": "(this AND that)"
							}
						}, {
							"terms": {
								"attributes.attribute1.facet": ["attribute1 value"]
							}
						},
						{
							"terms": {
								"attributes.attribute2.facet": ["attribute2 value"]
							}
						},
						{
							"range": {
								"price.current": {
									"gt": 10
								}
							}
						}
					]
				}
			}
		}
	},
	"from": 0,
	"size": 0
}

The 80% of the query were executing in less that 100ms until we had only ten attributes in the aggregation. Now we have 88 and the percentage lowered to 50%

I've tried several strategy to get back the previous performances
We have 3 masters, 2 client nodes and 5 data nodes. Cpu usage of the data nodes increased so I added 2 more. With same cpu usage as before the performance are still not the same

I tried removed the global aggregation, replaced by single attribute aggregations as single query with /_msearch. The performance went even worse
I used the same logic but parallelising as single query with /_search in the code (concurrent go routines)

The "eager_global_ordinals" was not present when we increased the subaggregations. No improvement after adding it

I could try to group together all the global subaggregations that share the same filters only run multiple queries for attributes with not common filter, but this can end up in basically the same as the /_msearch solution

Friendly hint: more than 500 lines of unformatted JSON is probably going to put most people off from trying to answer a question.

@Mark_Harwood, I know, but adding formatting will hit the length limit of the post.

Actually for formatting triple backticks is working. Not adding leading spaces if I select preformatted text :slight_smile:

This are the performance insight:
With 10 sub aggregations:

  • 0.5 percentile: 50ms
  • 0.9 percentile: 85ms
  • 0.99 percentile: 160ms

With 88 sub aggregations:

  • 0.5 percentile: 150ms
  • 0.9 percentile: 250ms
  • 0.99 percentile: 600ms

All the different strategies applied either didn't show any performance boost or even worsened it

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.