SearchHits.totalHits() vs. SearchHits.getHits().length

Hi guys,

im feeling strange, because query my test index deliver different amount of
results.

I have an index A with three types A1, A2 and A3. All entries have the same
source -> FIELD01:"This is a smal text". Every type hat 100 entries.

If i query the whole index with a lucene native query string like this
FIELD01:"This is a smal text" via prepareSearch() on the client and
SearchType.SCAN, i got 300 hits as result of
SearchResponse.getHits().totalHits();

Then, if i scroll through the result via prepareSearchScroll() i got everey
time if i start the test another length of SearchHit[] Array.

Where is my problem? Why deliver SearchResp.getHits().totalHits() 300
and SearchResp.getHits().getHits().length every run another amount < 300?

For better understanding my small code snip:

SearchResponse scrollResp =
client.prepareSearch(index).setTypes(types).setSearchType(SearchType.SCAN)
.setScroll(new
TimeValue(60000)).setQuery(QueryBuilders.queryString(luceneQuery))
.setSize(50).execute().actionGet();

while (true)
{
scrollResp =
this.esClient.prepareSearchScroll(scrollResp.getScrollId()).setScroll(new
TimeValue(60000)).execute().actionGet();
logger.debug("[searchByLuceneQuery] [{}] record(s) found in
scrolling searchResponse", scrollResp.getHits().totalHits()); // -> everey
run 300

for (final SearchHit hit : scrollResp.getHits())
{
if (!hit.isSourceEmpty())
{
......

}

if (scrollResp.getHits().getHits().length == 0) // -> never 300, every time
< 300
{
break;
}
}
}

Best regards
Alex

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Thttp://elasticsearch-users.115913.n3.nabble.com/totalHits-gets-changed-unexpectedly-while-scrolling-SearchResponse-td1575408.htmlhere
is a post from 2010 with similarly behavior.
http://elasticsearch-users.115913.n3.nabble.com/totalHits-gets-changed-unexpectedly-while-scrolling-SearchResponse-td1575408.html

Is it still a problem?

Am Dienstag, 22. Oktober 2013 19:10:38 UTC+2 schrieb Alex:

Hi guys,

im feeling strange, because query my test index deliver different amount
of results.

I have an index A with three types A1, A2 and A3. All entries have the
same source -> FIELD01:"This is a smal text". Every type hat 100 entries.

If i query the whole index with a lucene native query string like this
FIELD01:"This is a smal text" via prepareSearch() on the client and
SearchType.SCAN, i got 300 hits as result of
SearchResponse.getHits().totalHits();

Then, if i scroll through the result via prepareSearchScroll() i got
everey time if i start the test another length of SearchHit Array.

Where is my problem? Why deliver SearchResp.getHits().totalHits() 300
and SearchResp.getHits().getHits().length every run another amount < 300?

For better understanding my small code snip:

SearchResponse scrollResp =
client.prepareSearch(index).setTypes(types).setSearchType(SearchType.SCAN)
.setScroll(new
TimeValue(60000)).setQuery(QueryBuilders.queryString(luceneQuery))
.setSize(50).execute().actionGet();

while (true)
{
scrollResp =
this.esClient.prepareSearchScroll(scrollResp.getScrollId()).setScroll(new
TimeValue(60000)).execute().actionGet();
logger.debug("[searchByLuceneQuery] [{}] record(s) found in
scrolling searchResponse", scrollResp.getHits().totalHits()); // -> everey
run 300

for (final SearchHit hit : scrollResp.getHits())
{
if (!hit.isSourceEmpty())
{
......

}

if (scrollResp.getHits().getHits().length == 0) // -> never 300, every
time < 300
{
break;
}
}
}

Best regards
Alex

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Please note, you requested with setSize() the number of results to cache
per shard. You have used 50, so you will only get 300 hits in a scroll
response if you have 6 shards.

totalHits() is not related to scrolling at all, but to the whole result set
size.

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Thanks Jörg for yout hint. I have tried some other approaches and have
found a good resolution for me. I think the problem was the searchType
'SCAN'. Now I use the default one and it works fine.

scrollResp =
this.esClient.prepareSearch(index).setTypes(types).setQuery(QueryBuilders.queryString(luceneQuery)).setSize(fetchSizeInternal).setScroll(new
TimeValue(60000)).execute().actionGet();

while (scrollResp.getHits().getHits().length > 0)
{
for (final SearchHit hit : scrollResp.getHits().getHits())
{
//do some with the hit
}

    scrollResp = 

this.esClient.prepareSearchScroll(scrollResp.getScrollId()).setScroll(new
TimeValue(60000)).execute().actionGet();
}

Am Dienstag, 22. Oktober 2013 19:59:15 UTC+2 schrieb Jörg Prante:

Please note, you requested with setSize() the number of results to cache
per shard. You have used 50, so you will only get 300 hits in a scroll
response if you have 6 shards.

totalHits() is not related to scrolling at all, but to the whole result
set size.

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi Alex,

I'm not sure what you are trying to achieve but typically you'd use
scrolling to go through a complete result set. In that case the SCAN search
type is the one you want as it is way more efficient with large result sets
(with the price of giving up sorting).

With scrolling you basically iterate over the set of documents that matches
your query by reading smaller sets of documents at a time. Think about
going through 1M documents, reading 200 at a time. scrollResp.totalHits()
returns the total number of document that matches your query (1M in my
example) and scrollResp.getHits().getHits().length is the size of the chunk
you currently retrieved (200 in my example). As jorg mentioned, the size of
that chunk is what ever you've set in the setSize() parameter multiplied by
the number of shards an index have.

Cheers,
Boaz

On Tuesday, October 22, 2013 8:20:37 PM UTC+2, Alex wrote:

Thanks Jörg for yout hint. I have tried some other approaches and have
found a good resolution for me. I think the problem was the searchType
'SCAN'. Now I use the default one and it works fine.

scrollResp =
this.esClient.prepareSearch(index).setTypes(types).setQuery(QueryBuilders.queryString(luceneQuery)).setSize(fetchSizeInternal).setScroll(new
TimeValue(60000)).execute().actionGet();

while (scrollResp.getHits().getHits().length > 0)
{
for (final SearchHit hit : scrollResp.getHits().getHits())
{
//do some with the hit
}

    scrollResp = 

this.esClient.prepareSearchScroll(scrollResp.getScrollId()).setScroll(new
TimeValue(60000)).execute().actionGet();
}

Am Dienstag, 22. Oktober 2013 19:59:15 UTC+2 schrieb Jörg Prante:

Please note, you requested with setSize() the number of results to cache
per shard. You have used 50, so you will only get 300 hits in a scroll
response if you have 6 shards.

totalHits() is not related to scrolling at all, but to the whole result
set size.

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.