I'm trying to fully understand the best way to optimize for transforms and then searching running aggregations on resulting indices.
The situation:
- Stuck on 7.17.21 currently
- Non-managed cluster
- Conversion to data streams is extremely unlikely
- Average index is about 750 gigs
- Trying to apply transforms to make searching faster
- Indices previously had no sort applied
- Attempting to add index sorting to speed up both on-the-fly searches and for running transforms
- Read SO /questions/67907634/elasticsearch-sorted-index-not-working-as-expected-with-multiple-shards
- Read the supplied documents in SO response (Introducing Index Sorting in Elasticsearch 6.0 | Elastic Blog and Index Sorting | Elasticsearch Guide [8.15] | Elastic respectively)
- SO response states "your search queries will also need to contain the same sort specification at query time". I don't see anything that specifically states this in docs though although some of the examples show the inclusion of the sort in the search query.
Initially assumed that as long as the transform applied a filter/query body using the fields that are sorted that it would be able to use the index sorting optimization (eg: if totalCount
is sorted and we have a range of 5 - 100 in the query->bool->must array it would be more optimized than an index without it being sorted).
The questions:
- Will sort order really not matter if you do not include it in the search query? How does this work for "give me everything so I can aggregate off it and have a size of 0"? Do you still need to apply a sort?
- Would I need to apply a search into a transform's definition to have it actually use the index sorting (mainly concerned about batching through old indices not new data)?