Hey there,
As far as I know, terms aggregations are probably the most memory intensive and less accurate search request in ElasticSearch. Having said that, I would like to get guidelines concerning the following problem.
Document Structure
Let's say I have an index contains of information about movies and actors. It means that I have 1 document type - Movie, and each movie document is related to nested documents containing the actors of this movie.
You can see an illustration below:
The Search Requirement
Let's say I have a search query to get the top 10 actors which participated the most in a specific group of movies. That means that I have a query that narrows down the movies by using queries (range query, filter query) and then search for the requested actors by using aggregations (a main terms aggregation and a a value count sub aggregation).
It would be something like this:
The Problem
In the real world, there could be O(N) movies and O(N) actors. Because I want that each main bucket be sorted by its sub bucket, there could be N^2 buckets created by this search request.
That raises the following questions:
-
Does this type of queries even possible in ElasticSearch?
-
Is there any optimization I could use in the terms aggregation to optimize the search request? I mean playing with terms aggregation parameters such as collect mode, execution hit and so forth.
-
Is there any optimization I could use for improving other ElasticSearch components for serving this query? I mean mapping settings (such as eager_global_ordinals) or other index settings (such as caching).
-
It feels like there is an alternative: replacing the terms aggregation with something like top_hits aggregation of significant_terms aggregation, but I haven't figure it out how.
I really appreciate any help you can provide,
Idan