I'm scrolling the query result using the approach described here
(elasticsearch version 0.10.0)
For some reasons the total hits number unexpectedly decreases during
scrolling which of course leads to the processing less data than
expected.
Here's my code snippet on Scala which does all this work:
val scrollKeepAliveTime = new TimeValue(5, TimeUnit.MINUTES)
var searchResponse = esClient.prepareSearch(INDEX_NAME).
setTypes(myType).
setQuery(myBoolQuery).
setScroll(scrollKeepAliveTime).
execute.actionGet
while (searchResponse.hits.hits.length > 0) {
for (hit <- searchResponse.hits.hits) {
// Take 'hit' and do some magic
}
searchResponse =
Example:
When the searchResponse is obtained first time by executing
esClient.prepareSearch(...) it shows the total hits number 14. The
default scroll size is 10, so I process first 10 hits and request the
next portion of data via
esClient.prepareSearchScroll(searchResponse.scrollId), but instead of
expected 4 hits I get only 2 and 'searchResponse.hits().totalHits' now
shows only 12 hits (instead of 14).
And this behaviour is not 100% repeatable. Sometimes 14 becomes 12,
sometimes 11 and rare I get all 14.
Am I doing something wrong?
And another question - inside the for() loop in the above code "do
some magic" is going to mean - to make some changes to the retrieved
document and re-index it:
esClient.prepareIndex(INDEX_NAME, myType, hit.id).
setSource(updatedDocument).
execute.actionGet
Will this work well during scrolling? I mean won't the modifications
make any conflicts to the scrolling "cursor"?
I am experiencing this scrolling error as well with code following the same
pattern as your example. It has possibly been introduced in 0.9.0 because I
wrote the code against 0.8 and seem to remember it working as I expected.
Looking closer at the SearchResponse returned by the first scrolling
operation, it consistently shows a single shard failure (4 out of 5
successful shards) with an exception message like this:
shard [_na], reason [SearchContextMissingException[No search context found
for id [5], timed out]]
It seems to me as though the failed shard could be holding the entries that
mysteriously disappear after the first scroll.
Hi,
it seems (to me) scrolling doesn't work as expected (not retrieving all
matched documents) when using Java API on index with more than one shard
(for a single shard index seems to work fine).
Is there any chance this issue will be addressed in the near future or
is 'from' parameter workaround a way to go?
Also, can you see any changes done after you started the first 'scroll'
when using 'from' parameter workaround (or is it the same as scrolling)?
Tomislav
On Fri, 2010-09-24 at 19:25 +0200, Clinton Gormley wrote:
Hiya
I'm scrolling the query result using the approach described here
Yea, I plan to address the scrolling issue. Regarding
the visibility aspect, its a point in time scroll, so you won't see any
changes happening after the first search+scroll request has been executed.
-shay.banon
On Mon, Oct 11, 2010 at 9:07 PM, Tomislav Poljak tpoljak@gmail.com wrote:
Hi,
it seems (to me) scrolling doesn't work as expected (not retrieving all
matched documents) when using Java API on index with more than one shard
(for a single shard index seems to work fine).
Is there any chance this issue will be addressed in the near future or
is 'from' parameter workaround a way to go?
Also, can you see any changes done after you started the first 'scroll'
when using 'from' parameter workaround (or is it the same as scrolling)?
Tomislav
On Fri, 2010-09-24 at 19:25 +0200, Clinton Gormley wrote:
Hiya
I'm scrolling the query result using the approach described here
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.