'totalHits' gets changed unexpectedly while scrolling SearchResponse

Alex_Wajda · September 24, 2010, 5:17pm

Hi guys,

I'm scrolling the query result using the approach described here

gist.github.com

https://gist.github.com/kimchy/517664

gistfile1.java


SearchResponse searchResponse = client.prepareSearch()
        .setQuery(matchAllQuery())
        .setSize(35)
        .setScroll(TimeValue.timeValueMinutes(2))
        .addSort("field", SortOrder.ASC)
        .execute().actionGet();

assertThat(searchResponse.hits().getTotalHits(), equalTo(100l));
assertThat(searchResponse.hits().hits().length, equalTo(35));

This file has been truncated. show original

(elasticsearch version 0.10.0)

For some reasons the total hits number unexpectedly decreases during
scrolling which of course leads to the processing less data than
expected.

Here's my code snippet on Scala which does all this work:

val scrollKeepAliveTime = new TimeValue(5, TimeUnit.MINUTES)
var searchResponse = esClient.prepareSearch(INDEX_NAME).
        setTypes(myType).
        setQuery(myBoolQuery).
        setScroll(scrollKeepAliveTime).
        execute.actionGet

while (searchResponse.hits.hits.length > 0) {
    for (hit <- searchResponse.hits.hits) {
        // Take 'hit' and do some magic
    }
    searchResponse =

esClient.prepareSearchScroll(searchResponse.scrollId).
setScroll(scrollKeepAliveTime).
execute.actionGet
}

Example:
When the searchResponse is obtained first time by executing
esClient.prepareSearch(...) it shows the total hits number 14. The
default scroll size is 10, so I process first 10 hits and request the
next portion of data via
esClient.prepareSearchScroll(searchResponse.scrollId), but instead of
expected 4 hits I get only 2 and 'searchResponse.hits().totalHits' now
shows only 12 hits (instead of 14).

And this behaviour is not 100% repeatable. Sometimes 14 becomes 12,
sometimes 11 and rare I get all 14.
Am I doing something wrong?

And another question - inside the for() loop in the above code "do
some magic" is going to mean - to make some changes to the retrieved
document and re-index it:
esClient.prepareIndex(INDEX_NAME, myType, hit.id).
setSource(updatedDocument).
execute.actionGet
Will this work well during scrolling? I mean won't the modifications
make any conflicts to the scrolling "cursor"?

Thank you!
Alex.

Clinton_Gormley · September 24, 2010, 5:25pm

Hiya

I'm scrolling the query result using the approach described here

Scrolling is still broken for me:

I'd avoid it for now. You can replicate its function using the 'from'
parameter.

clint

DKichler · October 3, 2010, 8:13pm

I am experiencing this scrolling error as well with code following the same
pattern as your example. It has possibly been introduced in 0.9.0 because I
wrote the code against 0.8 and seem to remember it working as I expected.

Looking closer at the SearchResponse returned by the first scrolling
operation, it consistently shows a single shard failure (4 out of 5
successful shards) with an exception message like this:
shard [_na], reason [SearchContextMissingException[No search context found
for id [5], timed out]]

It seems to me as though the failed shard could be holding the entries that
mysteriously disappear after the first scroll.

On Fri, Sep 24, 2010 at 10:25 AM, Clinton Gormley [via Elasticsearch Users]
<ml-node+1575442-286360478-88438@n3.nabble.com ml-node%2B1575442-286360478-88438@n3.nabble.com

wrote:

Hiya

I'm scrolling the query result using the approach described here

Scrolling is still broken for me:
Issues · elastic/elasticsearch · GitHub

I'd avoid it for now. You can replicate its function using the 'from'
parameter.

clint

View message @
http://elasticsearch-users.115913.n3.nabble.com/totalHits-gets-changed-unexpectedly-while-scrolling-SearchResponse-tp1575408p1575442.html
To start a new topic under Elasticsearch Users, email
ml-node+115913-2004534304-88438@n3.nabble.com ml-node%2B115913-2004534304-88438@n3.nabble.com
To unsubscribe from Elasticsearch Users, click herehttp://elasticsearch-users.115913.n3.nabble.com/template/TplServlet.jtp?tpl=unsubscribe_by_code&node=115913&code=ZGtpY2hsZXJAc2Vla2Vyc29sLmNhfDExNTkxM3wtNTUwNDk3NDU5.

Tomislav_Poljak · October 11, 2010, 7:07pm

Hi,
it seems (to me) scrolling doesn't work as expected (not retrieving all
matched documents) when using Java API on index with more than one shard
(for a single shard index seems to work fine).

Is there any chance this issue will be addressed in the near future or
is 'from' parameter workaround a way to go?

Also, can you see any changes done after you started the first 'scroll'
when using 'from' parameter workaround (or is it the same as scrolling)?

Tomislav

On Fri, 2010-09-24 at 19:25 +0200, Clinton Gormley wrote:

Hiya

I'm scrolling the query result using the approach described here

Scrolling is still broken for me:
Issues · elastic/elasticsearch · GitHub

I'd avoid it for now. You can replicate its function using the 'from'
parameter.

clint

kimchy · October 11, 2010, 7:17pm

Hey,

Yea, I plan to address the scrolling issue. Regarding
the visibility aspect, its a point in time scroll, so you won't see any
changes happening after the first search+scroll request has been executed.

-shay.banon

On Mon, Oct 11, 2010 at 9:07 PM, Tomislav Poljak tpoljak@gmail.com wrote:

Hi,
it seems (to me) scrolling doesn't work as expected (not retrieving all
matched documents) when using Java API on index with more than one shard
(for a single shard index seems to work fine).

Is there any chance this issue will be addressed in the near future or
is 'from' parameter workaround a way to go?

Also, can you see any changes done after you started the first 'scroll'
when using 'from' parameter workaround (or is it the same as scrolling)?

Tomislav

On Fri, 2010-09-24 at 19:25 +0200, Clinton Gormley wrote:

Hiya

I'm scrolling the query result using the approach described here

Scrolling is still broken for me:
Issues · elastic/elasticsearch · GitHub

I'd avoid it for now. You can replicate its function using the 'from'
parameter.

clint

Topic		Replies	Views
Under what circumstances does hits.total change while scrolling? Elasticsearch	1	428	August 6, 2018
Missing result using scroll in java API Elasticsearch	10	1286	May 4, 2018
SearchHits.totalHits() vs. SearchHits.getHits().length Elasticsearch	5	9847	July 6, 2017
Scroll returns inconsistent number of results Elasticsearch	4	2124	March 8, 2018
How do I reduce scroll response time? Elasticsearch	1	507	July 6, 2017

'totalHits' gets changed unexpectedly while scrolling SearchResponse

Related topics