I run an Elasticsearch cluster with dozens of time-based monthly indexes and 240+ shards. Query performance is always a priority, so I have been wondering if there are ways to improve which indexes / shards are actually queried. I already tried looking for documentation and Github tickets to no avail.
Specifically, this is my situation:
- The indexes use index sorting, by date:desc,key:asc.
- All queries only ever sort by those sort keys as well.
- All queries are run in filter mode using the query string query.
- All queries have a track_total_hits of 10000.
- Some queries also query for the date (e.g. date:>now-1h)
Whenever I run a query it always seems to query all 240+ shards, and there are never skipped shards in the result. I was hoping for two things:
- Non-matching indexes / shards should be skipped according to a date filter in the query string
- Older indexes / shards should not be queried if the query has already reached 10000 results from the most recent monthly index.
I hope this question makes sense. I've tried playing with the max_concurrent_shard_requests and pre_filter_shard_size parameters to no avail.