'totalHits' gets changed unexpectedly while scrolling SearchResponse


(Alex Wajda) #1

Hi guys,

I'm scrolling the query result using the approach described here


(elasticsearch version 0.10.0)

For some reasons the total hits number unexpectedly decreases during
scrolling which of course leads to the processing less data than
expected.

Here's my code snippet on Scala which does all this work:

val scrollKeepAliveTime = new TimeValue(5, TimeUnit.MINUTES)
var searchResponse = esClient.prepareSearch(INDEX_NAME).
        setTypes(myType).
        setQuery(myBoolQuery).
        setScroll(scrollKeepAliveTime).
        execute.actionGet

while (searchResponse.hits.hits.length > 0) {
    for (hit <- searchResponse.hits.hits) {
        // Take 'hit' and do some magic
    }
    searchResponse =

esClient.prepareSearchScroll(searchResponse.scrollId).
setScroll(scrollKeepAliveTime).
execute.actionGet
}

Example:
When the searchResponse is obtained first time by executing
esClient.prepareSearch(...) it shows the total hits number 14. The
default scroll size is 10, so I process first 10 hits and request the
next portion of data via
esClient.prepareSearchScroll(searchResponse.scrollId), but instead of
expected 4 hits I get only 2 and 'searchResponse.hits().totalHits' now
shows only 12 hits (instead of 14).

And this behaviour is not 100% repeatable. Sometimes 14 becomes 12,
sometimes 11 and rare I get all 14.
Am I doing something wrong?

And another question - inside the for() loop in the above code "do
some magic" is going to mean - to make some changes to the retrieved
document and re-index it:
esClient.prepareIndex(INDEX_NAME, myType, hit.id).
setSource(updatedDocument).
execute.actionGet
Will this work well during scrolling? I mean won't the modifications
make any conflicts to the scrolling "cursor"?

Thank you!
Alex.


Under what circumstances does hits.total change while scrolling?
(Clinton Gormley) #2

Hiya

I'm scrolling the query result using the approach described here

Scrolling is still broken for me:

I'd avoid it for now. You can replicate its function using the 'from'
parameter.

clint


(DKichler) #3

I am experiencing this scrolling error as well with code following the same
pattern as your example. It has possibly been introduced in 0.9.0 because I
wrote the code against 0.8 and seem to remember it working as I expected.

Looking closer at the SearchResponse returned by the first scrolling
operation, it consistently shows a single shard failure (4 out of 5
successful shards) with an exception message like this:
shard [_na], reason [SearchContextMissingException[No search context found
for id [5], timed out]]

It seems to me as though the failed shard could be holding the entries that
mysteriously disappear after the first scroll.

On Fri, Sep 24, 2010 at 10:25 AM, Clinton Gormley [via ElasticSearch Users]
<ml-node+1575442-286360478-88438@n3.nabble.comml-node%2B1575442-286360478-88438@n3.nabble.com

wrote:

Hiya

I'm scrolling the query result using the approach described here

Scrolling is still broken for me:
http://github.com/elasticsearch/elasticsearch/issues#issue/136

I'd avoid it for now. You can replicate its function using the 'from'
parameter.

clint


View message @
http://elasticsearch-users.115913.n3.nabble.com/totalHits-gets-changed-unexpectedly-while-scrolling-SearchResponse-tp1575408p1575442.html
To start a new topic under ElasticSearch Users, email
ml-node+115913-2004534304-88438@n3.nabble.comml-node%2B115913-2004534304-88438@n3.nabble.com
To unsubscribe from ElasticSearch Users, click herehttp://elasticsearch-users.115913.n3.nabble.com/template/TplServlet.jtp?tpl=unsubscribe_by_code&node=115913&code=ZGtpY2hsZXJAc2Vla2Vyc29sLmNhfDExNTkxM3wtNTUwNDk3NDU5.


(Tomislav Poljak) #4

Hi,
it seems (to me) scrolling doesn't work as expected (not retrieving all
matched documents) when using Java API on index with more than one shard
(for a single shard index seems to work fine).

Is there any chance this issue will be addressed in the near future or
is 'from' parameter workaround a way to go?

Also, can you see any changes done after you started the first 'scroll'
when using 'from' parameter workaround (or is it the same as scrolling)?

Tomislav

On Fri, 2010-09-24 at 19:25 +0200, Clinton Gormley wrote:

Hiya

I'm scrolling the query result using the approach described here

Scrolling is still broken for me:
http://github.com/elasticsearch/elasticsearch/issues#issue/136

I'd avoid it for now. You can replicate its function using the 'from'
parameter.

clint


(Shay Banon) #5

Hey,

Yea, I plan to address the scrolling issue. Regarding
the visibility aspect, its a point in time scroll, so you won't see any
changes happening after the first search+scroll request has been executed.

-shay.banon

On Mon, Oct 11, 2010 at 9:07 PM, Tomislav Poljak tpoljak@gmail.com wrote:

Hi,
it seems (to me) scrolling doesn't work as expected (not retrieving all
matched documents) when using Java API on index with more than one shard
(for a single shard index seems to work fine).

Is there any chance this issue will be addressed in the near future or
is 'from' parameter workaround a way to go?

Also, can you see any changes done after you started the first 'scroll'
when using 'from' parameter workaround (or is it the same as scrolling)?

Tomislav

On Fri, 2010-09-24 at 19:25 +0200, Clinton Gormley wrote:

Hiya

I'm scrolling the query result using the approach described here

Scrolling is still broken for me:
http://github.com/elasticsearch/elasticsearch/issues#issue/136

I'd avoid it for now. You can replicate its function using the 'from'
parameter.

clint


(system) #6