Sort Terms Aggregation By Parent Docs Count

panda2004 · May 6, 2017, 12:55am

Hey there,

As far as I know, terms aggregations are probably the most memory intensive and less accurate search request in ElasticSearch. Having said that, I would like to get guidelines concerning the following problem.

Document Structure

Let's say I have an index contains of information about movies and actors. It means that I have 1 document type - Movie, and each movie document is related to nested documents containing the actors of this movie.
You can see an illustration below:

The Search Requirement

Let's say I have a search query to get the top 10 actors which participated the most in a specific group of movies. That means that I have a query that narrows down the movies by using queries (range query, filter query) and then search for the requested actors by using aggregations (a main terms aggregation and a a value count sub aggregation).
It would be something like this:

The Problem

In the real world, there could be O(N) movies and O(N) actors. Because I want that each main bucket be sorted by its sub bucket, there could be N^2 buckets created by this search request.

That raises the following questions:

Does this type of queries even possible in ElasticSearch?
Is there any optimization I could use in the terms aggregation to optimize the search request? I mean playing with terms aggregation parameters such as collect mode, execution hit and so forth.
Is there any optimization I could use for improving other ElasticSearch components for serving this query? I mean mapping settings (such as eager_global_ordinals) or other index settings (such as caching).
It feels like there is an alternative: replacing the terms aggregation with something like top_hits aggregation of significant_terms aggregation, but I haven't figure it out how.

I really appreciate any help you can provide,
Idan

panda2004 · May 6, 2017, 2:01am

I thought about my problem again and I have a few insights.

An actor can't belong to a movie more than once (and that's exactly the bug I have in my system now). Therefore, sorting is probably not needed: the top 10 actors are actually those who played in most of the movies.
In practical, the query I wrote probably isn't acceptable anyway in ElasticSearch - because of a known issue: 16838

system · June 3, 2017, 2:01am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Terms Aggregation Accuracy Elasticsearch	1	953	August 9, 2017
Performance of terms aggregation based on the number of indices/shards Elasticsearch	1	399	June 4, 2018
Buckets of documents grouped by term frequency Elasticsearch	3	594	July 5, 2017
Aggregation sorting performance implications Elasticsearch	1	486	July 20, 2017
Elasticsearch aggregation sort issue Elasticsearch	2	891	July 5, 2017

Sort Terms Aggregation By Parent Docs Count

Document Structure

The Search Requirement

The Problem

Related topics