Iterator over whole result set (Java API)

Hi,
I need to iterate over whole result set, meaning all of
hits.getTotalHits() results from response (for some query). Required
'size' is not defined before I execute query (I need all matches), so I
can not use query.setSize(N).

What is the best way to get 'total hits' iterator and iterate over
potentiality big results set?

I could execute 'count query' to get the number of matches for that
query and use query.setSize(count query result) in main query, but I'm
not sure this approach scales very well (I need to be able to iterate
over possibly millions of documents)

Thanks,
Tomislav

I think you can use the scrolling API but it is expensive in terms of
performance.
http://www.elasticsearch.com/docs/elasticsearch/rest_api/search/scroll/

Tal

On Aug 10, 5:12 pm, Tomislav Poljak tpol...@gmail.com wrote:

Hi,
I need to iterate over whole result set, meaning all of
hits.getTotalHits() results from response (for some query). Required
'size' is not defined before I execute query (I need all matches), so I
can not use query.setSize(N).

What is the best way to get 'total hits' iterator and iterate over
potentiality big results set?

I could execute 'count query' to get the number of matches for that
query and use query.setSize(count query result) in main query, but I'm
not sure this approach scales very well (I need to be able to iterate
over possibly millions of documents)

Thanks,
Tomislav

Hi Tal,
can you give me an example of how can 'scroll' be used with ES Java API?

For example, in code like below:

SearchResponse response = client.search(

searchRequest(index).types(type).searchType(SearchType.QUERY_AND_FETCH).source(

searchSource().query(queryString(queryStr)).explain(false))).actionGet();

    SearchHits hits = response.getHits();

    Iterator<SearchHit> iterator = hits.iterator();

    while (iterator.hasNext()) {
        SearchHit searchHit = (SearchHit) iterator.next();
        //need all hits here
    }

Will I get all hits with scroll param?

Thanks,
Tomislav

On Tue, 2010-08-10 at 07:47 -0700, Tal wrote:

I think you can use the scrolling API but it is expensive in terms of
performance.
http://www.elasticsearch.com/docs/elasticsearch/rest_api/search/scroll/

Tal

On Aug 10, 5:12 pm, Tomislav Poljak tpol...@gmail.com wrote:

Hi,
I need to iterate over whole result set, meaning all of
hits.getTotalHits() results from response (for some query). Required
'size' is not defined before I execute query (I need all matches), so I
can not use query.setSize(N).

What is the best way to get 'total hits' iterator and iterate over
potentiality big results set?

I could execute 'count query' to get the number of matches for that
query and use query.setSize(count query result) in main query, but I'm
not sure this approach scales very well (I need to be able to iterate
over possibly millions of documents)

Thanks,
Tomislav

Hi,

here is a gist with an example: gist:517664 · GitHub.

-shay.banon

On Tue, Aug 10, 2010 at 6:24 PM, Tomislav Poljak tpoljak@gmail.com wrote:

Hi Tal,
can you give me an example of how can 'scroll' be used with ES Java API?

For example, in code like below:

SearchResponse response = client.search(

searchRequest(index).types(type).searchType(SearchType.QUERY_AND_FETCH).source(

searchSource().query(queryString(queryStr)).explain(false))).actionGet();

   SearchHits hits = response.getHits();

   Iterator<SearchHit> iterator = hits.iterator();

   while (iterator.hasNext()) {
       SearchHit searchHit = (SearchHit) iterator.next();
       //need all hits here
   }

Will I get all hits with scroll param?

Thanks,
Tomislav

On Tue, 2010-08-10 at 07:47 -0700, Tal wrote:

I think you can use the scrolling API but it is expensive in terms of
performance.
http://www.elasticsearch.com/docs/elasticsearch/rest_api/search/scroll/

Tal

On Aug 10, 5:12 pm, Tomislav Poljak tpol...@gmail.com wrote:

Hi,
I need to iterate over whole result set, meaning all of
hits.getTotalHits() results from response (for some query). Required
'size' is not defined before I execute query (I need all matches), so I
can not use query.setSize(N).

What is the best way to get 'total hits' iterator and iterate over
potentiality big results set?

I could execute 'count query' to get the number of matches for that
query and use query.setSize(count query result) in main query, but I'm
not sure this approach scales very well (I need to be able to iterate
over possibly millions of documents)

Thanks,
Tomislav

One thing to remember though, scrolling is a heavy operation (its like
a cursor in a database), and also you won't get duplicates or anything like
that (actually, you won't see any changes done after you started the first
scroll). So, it should not be heavily used.

-shay.banon

On Tue, Aug 10, 2010 at 8:45 PM, Shay Banon shay.banon@elasticsearch.comwrote:

Hi,

here is a gist with an example: gist:517664 · GitHub.

-shay.banon

On Tue, Aug 10, 2010 at 6:24 PM, Tomislav Poljak tpoljak@gmail.comwrote:

Hi Tal,
can you give me an example of how can 'scroll' be used with ES Java API?

For example, in code like below:

SearchResponse response = client.search(

searchRequest(index).types(type).searchType(SearchType.QUERY_AND_FETCH).source(

searchSource().query(queryString(queryStr)).explain(false))).actionGet();

   SearchHits hits = response.getHits();

   Iterator<SearchHit> iterator = hits.iterator();

   while (iterator.hasNext()) {
       SearchHit searchHit = (SearchHit) iterator.next();
       //need all hits here
   }

Will I get all hits with scroll param?

Thanks,
Tomislav

On Tue, 2010-08-10 at 07:47 -0700, Tal wrote:

I think you can use the scrolling API but it is expensive in terms of
performance.
http://www.elasticsearch.com/docs/elasticsearch/rest_api/search/scroll/

Tal

On Aug 10, 5:12 pm, Tomislav Poljak tpol...@gmail.com wrote:

Hi,
I need to iterate over whole result set, meaning all of
hits.getTotalHits() results from response (for some query). Required
'size' is not defined before I execute query (I need all matches), so
I
can not use query.setSize(N).

What is the best way to get 'total hits' iterator and iterate over
potentiality big results set?

I could execute 'count query' to get the number of matches for that
query and use query.setSize(count query result) in main query, but I'm
not sure this approach scales very well (I need to be able to iterate
over possibly millions of documents)

Thanks,
Tomislav