I have implemented a system of aggregations for search filters and am enduring speed problems with a bit longer queries: there are no problems when a query contains up to 4 words but extremely slow queries onward.
I have used the profiling API, and it turns out that the aggregations are slowing down everything, taking roughly 24 s out of the 25 s required for the search. Here is a link to the gist with the full profiling.
As can be seen from the gist, I am using sampler aggregations, otherwise there are constant timeout errors as searches take way more than 60 seconds.
The mapping for my index is the following:
{
"movies" : {
"mappings" : {
"properties" : {
"description" : {
"type" : "text"
},
"all_actors" : {
"type" : "text"
},
"episode_title" : {
"type" : "text"
},
"actors_keyword" : {
"type" : "keyword",
"ignore_above" : 1000
},
"series_title" : {
"type" : "keyword",
"ignore_above" : 1000
},
"language" : {
"type" : "keyword",
"ignore_above" : 1000
},
"number_of_actors" : {
"type" : "short"
},
"translated_title" : {
"type" : "text"
},
"subject_areas" : {
"type" : "text"
},
"subject_areas_keyword" : {
"type" : "keyword",
"ignore_above" : 1000
},
"url" : {
"type" : "text"
},
"year" : {
"type" : "short",
"null_value" : 0
}
}
}
}
}
Other data that might be helpful:
- number of shards: 1
- number of replicas: 0
- number of documents in the index: approx. 80 million documents
- size of the index: approx. 30 GB
- the fields such as 'actors_keyword' and 'subject_areas' are high-cardinality ones, and I am currently not using the 'eager_global_ordinals' setting
If I turn off the aggregations at all, the search queries work great up until longer queries with over 20 words or so.
The question is: how can I improve the current situation? I am a novice, so I might be missing even some apparently evident details.