I'm using Java API to index/search in elasticsearch. I noticed that
when search results are extremely big (for instance more than 200.000
hits), scroll hangs after it has iterated about 50.000 records.
For instance I have following code to produce a search response:
and then I do following to scroll through the result
while (true) {
scrollResp =
client.prepareSearchScroll(scrollResp.getScrollId()).setScroll(
new TimeValue(1000)).execute().actionGet();
boolean hitsRead = false;
for (SearchHit hit : scrollResp.getHits()) {
hitsRead = true;
def sourceMap = hit.getSource();
sourceMap._id = hit.getId();
resultSet.result.add(sourceMap);
}
if (!hitsRead) {
break;
}
}
The prepareSearchScroll() call tends to hang or even the for loop when
reading the next record. Can't elasticsearch handle such big amount of
hits and should the search be more restricted to produce less amount
of hits?
Scan is aimed at scrolling large amount of data. Do you see any failures
the logs in the cluster nodes? Iterating through the hits
will definitely not hang, since all are already represented in the search
response, maybe your client does not have enough mem allocated to it to do
the scrolling? Try and use a smaller size (like 100), see if it helps.
I'm using Java API to index/search in elasticsearch. I noticed that
when search results are extremely big (for instance more than 200.000
hits), scroll hangs after it has iterated about 50.000 records.
For instance I have following code to produce a search response:
and then I do following to scroll through the result
while (true) {
scrollResp =
client.prepareSearchScroll(scrollResp.getScrollId()).setScroll(
new TimeValue(1000)).execute().actionGet();
boolean hitsRead = false;
for (SearchHit hit : scrollResp.getHits()) {
hitsRead = true;
def sourceMap = hit.getSource();
sourceMap._id = hit.getId();
resultSet.result.add(sourceMap);
}
if (!hitsRead) {
break;
}
}
The prepareSearchScroll() call tends to hang or even the for loop when
reading the next record. Can't elasticsearch handle such big amount of
hits and should the search be more restricted to produce less amount
of hits?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.