Hi, I would like to search 1M filter queries in parallel in order to reduce the search time. This means a boolean query with 1M filter values, an no ordering is necessary. My index size is about 1GB, and I've stored everything in one shard, one node at the moment. I'm currently using searchAfter on 10k filter queries per request. So that means I will run 100 searchAfter queries in sequence with each query taking in 10k values (of the total 1M). This currently takes a long time (more than a couple minutes). I would like to consider parallelizing this search such that each of the 100 queries can run in parallel (e.g. on 100 client threads), and be aggregated within my application. What's the best way to do this? Some thoughts:
- should I still use searchAfter if I want parallelism? or should i use multisearch? (I do not need ordering/sorting/pagination). Does multisearch also have a limit to max_result_window? The reason why I tried using searchAfter is to exceed max_result_window.
- should I create shards to introduce parallelism at my index size? Elasticsearch documentation recommends shard sizes between 10GB and 50GB however, and my index size is only 1GB.
- can I introduce hypothetically 100 replicas of the 1GB shard/node and create 100 search requests in parallel? Theoretically if I have a pool of 100 workers together serving a total of 100 simultaneous searches, with each search taking 10k queries, then each worker would only need to process one search operation.