Sort Terms Aggregation By Parent Docs Count

Hey there,

As far as I know, terms aggregations are probably the most memory intensive and less accurate search request in ElasticSearch. Having said that, I would like to get guidelines concerning the following problem.

Document Structure

Let's say I have an index contains of information about movies and actors. It means that I have 1 document type - Movie, and each movie document is related to nested documents containing the actors of this movie.
You can see an illustration below:

The Search Requirement

Let's say I have a search query to get the top 10 actors which participated the most in a specific group of movies. That means that I have a query that narrows down the movies by using queries (range query, filter query) and then search for the requested actors by using aggregations (a main terms aggregation and a a value count sub aggregation).
It would be something like this:

The Problem

In the real world, there could be O(N) movies and O(N) actors. Because I want that each main bucket be sorted by its sub bucket, there could be N^2 buckets created by this search request.

That raises the following questions:

  1. Does this type of queries even possible in ElasticSearch?

  2. Is there any optimization I could use in the terms aggregation to optimize the search request? I mean playing with terms aggregation parameters such as collect mode, execution hit and so forth.

  3. Is there any optimization I could use for improving other ElasticSearch components for serving this query? I mean mapping settings (such as eager_global_ordinals) or other index settings (such as caching).

  4. It feels like there is an alternative: replacing the terms aggregation with something like top_hits aggregation of significant_terms aggregation, but I haven't figure it out how.

I really appreciate any help you can provide,
Idan

I thought about my problem again and I have a few insights.

  • An actor can't belong to a movie more than once (and that's exactly the bug I have in my system now). Therefore, sorting is probably not needed: the top 10 actors are actually those who played in most of the movies.
  • In practical, the query I wrote probably isn't acceptable anyway in ElasticSearch - because of a known issue: 16838

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.