SearchScroll hangs when dealing with a big amount of hits

Hi,

I'm using Java API to index/search in elasticsearch. I noticed that
when search results are extremely big (for instance more than 200.000
hits), scroll hangs after it has iterated about 50.000 records.

For instance I have following code to produce a search response:

SearchResponse scrollResp =
client.prepareSearch().setIndices(indexName).setTypes(ES_TYPE)
.setSearchType(SearchType.SCAN).setQuery(buildQuery)
.setScroll(new TimeValue(1000))
.setSize(1000).execute().actionGet();

and then I do following to scroll through the result

while (true) {
scrollResp =
client.prepareSearchScroll(scrollResp.getScrollId()).setScroll(
new TimeValue(1000)).execute().actionGet();
boolean hitsRead = false;
for (SearchHit hit : scrollResp.getHits()) {
hitsRead = true;
def sourceMap = hit.getSource();
sourceMap._id = hit.getId();
resultSet.result.add(sourceMap);
}
if (!hitsRead) {
break;
}
}

The prepareSearchScroll() call tends to hang or even the for loop when
reading the next record. Can't elasticsearch handle such big amount of
hits and should the search be more restricted to produce less amount
of hits?

br, Piotr

Scan is aimed at scrolling large amount of data. Do you see any failures
the logs in the cluster nodes? Iterating through the hits
will definitely not hang, since all are already represented in the search
response, maybe your client does not have enough mem allocated to it to do
the scrolling? Try and use a smaller size (like 100), see if it helps.

On Mon, May 14, 2012 at 4:28 PM, piotrs piotr.skawinski@gmail.com wrote:

Hi,

I'm using Java API to index/search in elasticsearch. I noticed that
when search results are extremely big (for instance more than 200.000
hits), scroll hangs after it has iterated about 50.000 records.

For instance I have following code to produce a search response:

SearchResponse scrollResp =
client.prepareSearch().setIndices(indexName).setTypes(ES_TYPE)
.setSearchType(SearchType.SCAN).setQuery(buildQuery)
.setScroll(new TimeValue(1000))
.setSize(1000).execute().actionGet();

and then I do following to scroll through the result

while (true) {
scrollResp =
client.prepareSearchScroll(scrollResp.getScrollId()).setScroll(
new TimeValue(1000)).execute().actionGet();
boolean hitsRead = false;
for (SearchHit hit : scrollResp.getHits()) {
hitsRead = true;
def sourceMap = hit.getSource();
sourceMap._id = hit.getId();
resultSet.result.add(sourceMap);
}
if (!hitsRead) {
break;
}
}

The prepareSearchScroll() call tends to hang or even the for loop when
reading the next record. Can't elasticsearch handle such big amount of
hits and should the search be more restricted to produce less amount
of hits?

br, Piotr