Slow sub-aggregation for low-cardinality field + high-cardinality field

Hi there,

I'm having index having 14M documents, occupying 1.6TB of pri.storage with 50shards.

There is query doing 22 aggregations (mainly simple terms aggs), took 0.3s.
I needed to add one sub-aggregation to every of those 22 aggs and execution time (took) grows to 5.3s.

This sub-agg is "cardinality" aggregation over field having ~1.15M distinct values (DOC_DEAL_NO)

      "uwrtYear" : {
        "terms" : {
          "field" : "content.ACCP_UWRT_YR.keyword",
          "order" : {
            "_key" : "desc"
          },
          "size" : 50
        },
        "aggs" : {
          "deals_count" : {
            "cardinality" : {
              "field" : "content.DOC_DEAL_NO.keyword"
            }
          }
        }
      }

Average times collected from "profile":true response:

The most interesting fact is, that aggregations slowing down the request mostly are the ones, with low cardinality (at the end of the table - uwrtStatus, reinsurer, uwrtYear - column A)

Checking the "profile" part of the response, "collect" phase takes most of the time.

{
	"type": "GlobalOrdinalsStringTermsAggregator",
	"description": "uwrtStatus",
	"time": "257.2ms",
	"time_in_nanos": 257260689,
	"breakdown": {
		"reduce": 0,
		"post_collection_count": 1,
		"build_leaf_collector": 2930416,
		"build_aggregation": 1053952,
		"build_aggregation_count": 1,
		"build_leaf_collector_count": 36,
		"post_collection": 10203,
		"initialize": 584,
		"initialize_count": 1,
		"reduce_count": 0,
		"collect": 253265534,
		"collect_count": 184731
	},
	"debug": {
		"segments_with_multi_valued_ords": 0,
		"collection_strategy": "remap using single bucket ords",
		"segments_with_single_valued_ords": 36,
		"total_buckets": 26,
		"built_buckets": 1,
		"result_strategy": "terms",
		"has_filter": false
	},
	"children": [
		{
			"type": "CardinalityAggregator",
			"description": "deals_count",
			"time": "230.8ms",
			"time_in_nanos": 230810795,
			"breakdown": {
				"reduce": 0,
				"post_collection_count": 1,
				"build_leaf_collector": 1278362,
				"build_aggregation": 1033225,
				"build_aggregation_count": 1,
				"build_leaf_collector_count": 36,
				"post_collection": 8816,
				"initialize": 59,
				"initialize_count": 1,
				"reduce_count": 0,
				**"collect": 228490333,**
				"collect_count": 184731
			},
			"debug": {
				"ordinals_collectors_used": 32,
				"ordinals_collectors_overhead_too_high": 4,
				"built_buckets": 26,
				"string_hashing_collectors_used": 4,
				"numeric_collectors_used": 0,
				"empty_collectors_used": 0
			}
		}
	]
}

I have tried to apply "eager_global_ordinals": true for DOC_DEAL_NO, but no luck, still getting 5.3s response time.

Can anybody help with ideas, what else I can do/check to speed up it?

Thanks!
Dominik

I was able to move forward with this solution pre_computed_hashes with "mapper-murmur3" plugin over high-cardinality field (DOC_DEAL_NO).

Running the same query (using .hash field in sub-aggregation) the request time drops to 1.6s.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.