Terms aggregation with nested objects performs poorly

Hello!

For one of our scenarios we require to run terms aggregation with script on nested objects to extract ids only from certain nested objects (based on passed mention type and identity data). And from basic load testing it is clear that using this script is very slow even on pack of 200 000 root objects, with 5-10 nested objects. When using just nested objects term / composite aggregations numbers are fine - 1-3 seconds at worst, but with script in terms aggregation i get stable 14+ seconds of execution. I swapped HashSet to array to collect results, even changed whole for loop with input params to just comparing to static value, but it hardly changed any numbers. We cannot use composite aggregation to get exact numbers (as we need to get counts of root objects, not nested, and this is problematic with composite), and its clumsy due to size limits, so we are stuck with terms+script. Can i speed it up in some way? What can i tweak to make it run faster? And most concerning is the fact that simple update scripts on nested objects runs very-very fast compared to this? Or is it just very bad scenario for elasticsearch?

We are using Elasticsearch as object-search with lots of fields and some nested objects with structure like this

- publication
	- mentionsType1 (nested)
		- identity1 (string)
		- identity2 (int)
		- identity3 (int)
		- markerAttributes(int)
	-mentionsType2 (nested)
		- identity1 (string)
		- markerAttributes(int)

Script looks like this:

def factors = new [];

if(params.acceptables.mentionsType1 != null && params._source['mentionsType1'] != null){


	for(mentionsType1 in params._source['mentionsType1']){
		if (mentionsType1 != null){
			for(acceptable in params.acceptables.mentionsType1){
				if (
					(acceptable.identity1 != null && mentionsType1['identity1'] == acceptable.identity1)
					|| (acceptable.identity2 != null  && mentionsType1['identity2'] == acceptable.identity2)
					|| (acceptable.identity3 != null && mentionsType1['identity3'] == acceptable.identity3)
					){
					if(mentionsType1['markerAttributes'] != null){
						factors.add(mentionsType1['markerAttributes']);
					}
				}
			}

		}
	}
}

// same for mentionsType2

if(factors.isEmpty()){
	return null;
}

return factors;

and query is like


POST /my-test-index/_search
{
	"aggs": {

		"risks": {
			"terms": {
				"script": {
					"id": "group-by-risk-id",
					"params": {
						"acceptables": {
							"mentionsType1": [{"identity1":112233}]
						}
					}
				}
			}
		}
		
	},
	// dont mind structure, its just 1 nested term
	"query": {
		"constant_score": {
			"filter": {
				"bool": {
					"filter": [
						{
							"bool": {
								"must": [
									{
										"bool": {
											"must": [
												{
													"bool": {
														"must": [
															{
																"bool": {
																	"should": [
																		{
																			"nested": {
																				"path": "mentionsType1",
																				"query": {
																					"bool": {
																						"must": [
																							{
																								"term": {
																									"mentionsType1.identity1": {
																										"value": 112233
																									}
																								}
																							}
																						]
																					}
																				}
																			}
																		}
																	]
																}
															}
														]
													}
												}
												
											]
										}
									}
								]
							}
						},
						// SOME FILTER FOR DATES
					]
				}
			}
		}
	},
	"size": 0
}

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.