ES Verson: 7.15.0
We were exploring the significant_text plugin of ES (via REST API) for generating word cloud from our data. As the query took more time to execute, we decided to use a timeout. Here is what the official documentation of timeout states.
Each shard collects hits within the specified time period. If collection isn’t finished when the period ends, Elasticsearch uses only the hits accumulated up to that point. The overall latency of a search request depends on the number of shards needed for the search and the number of concurrent shard requests.
And here is my REST API.
GET /<24-indices>/_search
{
"timeout": "500ms",
"query": {
"bool": {
"filter": [
{
"terms": {
"channelId": [
// 50+ channel ids
]
}
},
{
"range": {
"postPublishedOn": {
"gte": "2021-01-01",
"lte": "2022-12-31"
}
}
}
]
}
},
"aggs": {
"sample": {
"sampler": {
"shard_size": 10000
},
"aggs": {
"keywords": {
"significant_text": {
"field": "caption",
"filter_duplicate_text": true,
"size": 25
}
}
}
}
}
}
Here is the result
{
"took": 29652,
"timed_out": true,
"_shards": {
"total": 24,
"successful": 24,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 10000,
"relation": "gte"
},
"max_score": 0,
"hits": [
...
]
},
"aggregations": {
...
}
}
As we can see, the request is timed out but still took around 30s despite setting the timeout to 500ms. What am I missing here?
The search went through 24 shards. As per the doc, some of these 24 requests would have executed parallelly with each of them having a timeout of 500ms. Even if it is executed sequentially, 24 x 500 ms = 12s. Which is way below 30s.
So here are my questions.
- Will the requests to different shards execute parallel in all circumstances? If not, then in what scenario would they execute sequentially?
- In either case, why is the query taking more time despite getting timed out?
- This one is about significant_text plugin. Does this plugin really take that long to complete execution? Below are more details about my dataset.
Shard details:-
{
"docs": {
"count": 106287465,
"deleted": 30635607
},
"shards": {
"total_count": 24
},
"store": {
"size_in_bytes": 106441596761,
"total_data_set_size_in_bytes": 106441596761,
"reserved_in_bytes": 0
}
}
Approx matched doc count: 35k