Empty Slices with Scan/Scroll

bevans88 · April 9, 2018, 6:43am

Hi,

We're currently using Spark with es-hadoop to read 1 million documents from an Elasticsearch index. The sliced scan scroll that it is using internally is not evenly distributing the results across the slices; in fact all but 1 slice is empty for each shard preferenced scan scroll;

Elasticsearch 6.0.0
Single node cluster for testing
5 Shards
1 million documents in an index
es.input.max.docs.per.partition set to 50k

Match All query is run to obtain all documents from the index; this results in 5 scan scrolls with 4 slices each.
Of the 4 slices in each scan scroll only 1 of them contains any results (~ 200k).

Is there anyway to evenly distribute the results across the slices? I believe this may be the same issue as that described in https://github.com/elastic/elasticsearch/issues/27550.

Thanks,

Brent

james.baiera · April 9, 2018, 6:15pm

It's possible that you are running into that linked issue. If that is the case, there's not much that we can do in terms of balancing the sliced scrolls. You could set the es.input.max.docs.per.parition setting to a really high number like MAX_INT which should effectively disable the slicing features. This would only eliminate the overhead of those empty tasks from the empty slices.

system · May 7, 2018, 6:16pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Elasticsearch: empty slices when using scroll api with slice Elasticsearch	3	866	August 7, 2020
Incomplete results for scan / scroll searches Elasticsearch	3	761	July 6, 2017
Scroll and Scan Elasticsearch	4	472	July 6, 2017
Issues with scan and scroll as well as count API Elasticsearch	5	1918	July 5, 2017
Scrolling / sorting Elasticsearch	6	3330	July 6, 2017

Empty Slices with Scan/Scroll

Related topics