How to cover all conditions when fetch all data with scroll?

based on this discuss:

the content I quote below:
Short answer: yes, if you have a single shard index (as seems to be in your case) - it is expected behavior, but it can happen even if you have multiple shards. Longer answer: the scroll basically contains a list of shards where your search is running plus information about how to find your scroll data on each shard. As you exhaust results from each shard, you will notice that the scroll id becomes shorter, because we no longer need to search these shards and therefore don't need to list them on scroll. But if only have one shard or all shards will get processed at the same time, your scroll id might never change. Saying this, I wouldn't rely on this behavior since it might change in the future and always copy scroll id from the previous response.

so my question is: considering the fact that scrollId may stay same under certain conditions, how to write code to fetch all data exactly once(in order to make things easier, let's exclude the modification action when fetch data)?

in my case, I have 1 shard index and it seems the below iteration loop is infinite:
while (searchHits != null && searchHits.length > 0) {
...
}

When used as documented, a scroll already returns each document once.

in my case, I have 1 shard index and it seems the below iteration loop is infinite:
while (searchHits != null && searchHits.length > 0) {
...
}
how to deal with infinite loop if there is 1 shard? how to deal this situation I really met?

from this link, we are supposed to talk the issue here.

so let me add more info:

  1. I do met infinite loop with the sample code in doc
  2. I do have one shard, that's the reason I guess caused the infinite loop.
  3. my ES version is 7.3.1
  4. I use ES high level rest client

what else do u guys want to know?

The code in the docs terminates correctly after retrieving all documents. It's checked as part of the test suite. The number of shards isn't relevant. I am guessing you've altered this code somehow and now it's not working? You haven't really shared enough information to reproduce what you're seeing.

let me clarify my situation:

  1. I have N documents in one index with one share only.
  2. I used search-scroll-example but add this line: searchSourceBuilder.size(SCROLL_SIZE);
  3. here, SCROLL_SIZE > N
  4. then I fall into infinite loop with same scrollId keep showing

what other info do u need?

I tried doing what you describe:

diff --git a/client/rest-high-level/src/test/java/org/elasticsearch/client/documentation/SearchDocumentationIT.java b/client/rest-high-level/src/test/java/org/elasticsearch/client/documentation/SearchDocumentationIT.java
index 995a50508fc..6fba679b66e 100644
--- a/client/rest-high-level/src/test/java/org/elasticsearch/client/documentation/SearchDocumentationIT.java
+++ b/client/rest-high-level/src/test/java/org/elasticsearch/client/documentation/SearchDocumentationIT.java
@@ -709,6 +709,7 @@ public class SearchDocumentationIT extends ESRestHighLevelClientTestCase {
             searchRequest.scroll(scroll);
             SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
             searchSourceBuilder.query(matchQuery("title", "Elasticsearch"));
+            searchSourceBuilder.size(between(1, 10));
             searchRequest.source(searchSourceBuilder);

             SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT); // <1>

In the test there are three docs, so this covers the cases where the size is both smaller and larger than the number of docs. It still passes.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.