How to cover all conditions when fetch all data with scroll?

riverbuilding · December 31, 2019, 3:22pm

based on this discuss:

the content I quote below:
Short answer: yes, if you have a single shard index (as seems to be in your case) - it is expected behavior, but it can happen even if you have multiple shards. Longer answer: the scroll basically contains a list of shards where your search is running plus information about how to find your scroll data on each shard. As you exhaust results from each shard, you will notice that the scroll id becomes shorter, because we no longer need to search these shards and therefore don't need to list them on scroll. But if only have one shard or all shards will get processed at the same time, your scroll id might never change. Saying this, I wouldn't rely on this behavior since it might change in the future and always copy scroll id from the previous response.

so my question is: considering the fact that scrollId may stay same under certain conditions, how to write code to fetch all data exactly once(in order to make things easier, let's exclude the modification action when fetch data)?

in my case, I have 1 shard index and it seems the below iteration loop is infinite:
while (searchHits != null && searchHits.length > 0) {
...
}

DavidTurner · December 31, 2019, 7:28pm

When used as documented, a scroll already returns each document once.

riverbuilding · December 31, 2019, 7:37pm

in my case, I have 1 shard index and it seems the below iteration loop is infinite:
while (searchHits != null && searchHits.length > 0) {
...
}
how to deal with infinite loop if there is 1 shard? how to deal this situation I really met?

riverbuilding · January 2, 2020, 8:25pm

from this link, we are supposed to talk the issue here.

so let me add more info:

I do met infinite loop with the sample code in doc
I do have one shard, that's the reason I guess caused the infinite loop.
my ES version is 7.3.1
I use ES high level rest client

what else do u guys want to know?

DavidTurner · January 2, 2020, 10:00pm

The code in the docs terminates correctly after retrieving all documents. It's checked as part of the test suite. The number of shards isn't relevant. I am guessing you've altered this code somehow and now it's not working? You haven't really shared enough information to reproduce what you're seeing.

riverbuilding · January 2, 2020, 11:04pm

let me clarify my situation:

I have N documents in one index with one share only.
I used search-scroll-example but add this line: searchSourceBuilder.size(SCROLL_SIZE);
here, SCROLL_SIZE > N
then I fall into infinite loop with same scrollId keep showing

what other info do u need?

DavidTurner · January 3, 2020, 12:53pm

I tried doing what you describe:

diff --git a/client/rest-high-level/src/test/java/org/elasticsearch/client/documentation/SearchDocumentationIT.java b/client/rest-high-level/src/test/java/org/elasticsearch/client/documentation/SearchDocumentationIT.java
index 995a50508fc..6fba679b66e 100644
--- a/client/rest-high-level/src/test/java/org/elasticsearch/client/documentation/SearchDocumentationIT.java
+++ b/client/rest-high-level/src/test/java/org/elasticsearch/client/documentation/SearchDocumentationIT.java
@@ -709,6 +709,7 @@ public class SearchDocumentationIT extends ESRestHighLevelClientTestCase {
             searchRequest.scroll(scroll);
             SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
             searchSourceBuilder.query(matchQuery("title", "Elasticsearch"));
+            searchSourceBuilder.size(between(1, 10));
             searchRequest.source(searchSourceBuilder);

             SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT); // <1>

In the test there are three docs, so this covers the cases where the size is both smaller and larger than the number of docs. It still passes.

system · January 31, 2020, 12:53pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Scroll id is not changing while querying Elasticsearch	2	4918	December 8, 2017
Same scroll id Elasticsearch	3	1787	July 5, 2017
Different scroll id for each subsequent scroll request? Elasticsearch	7	517	May 25, 2020
ElasticSearch scroll over all documents always returns same 10 results Elasticsearch	2	2417	December 15, 2017
While scrolling, does the scroll id change Elasticsearch	3	383	July 26, 2022

How to cover all conditions when fetch all data with scroll?

Related topics