We're currently using Spark with es-hadoop to read 1 million documents from an Elasticsearch index. The sliced scan scroll that it is using internally is not evenly distributing the results across the slices; in fact all but 1 slice is empty for each shard preferenced scan scroll;
- Elasticsearch 6.0.0
- Single node cluster for testing
- 5 Shards
- 1 million documents in an index
- es.input.max.docs.per.partition set to 50k
- Match All query is run to obtain all documents from the index; this results in 5 scan scrolls with 4 slices each.
- Of the 4 slices in each scan scroll only 1 of them contains any results (~ 200k).
Is there anyway to evenly distribute the results across the slices? I believe this may be the same issue as that described in https://github.com/elastic/elasticsearch/issues/27550.