What's the best way to get the whole search results with score in java api?

Hi all,

I have known that I could get the whole search results without score using
Scan/Scroll type, while I I want to obtain the whole search hits with score.

I used count query to get the number of search hits and then get them page
by page (100 records in every page).
While if the result size is large, say 5000, the program can not promise
obtain all the result hits, which is very wierd for me.
How could I resolve this problem and get the whole search hits set
successfully?
My code was attached as follows:
BoolQueryBuilder qb = QueryBuilders
.boolQuery()
.must(matchPhraseQuery("body_cleansed", entity));

            ArrayList<String> result = new ArrayList<String>();
            QueryData qd = new QueryData();
            CountResponse countResp = qd.countQuery(qb, indexName);

long count = countResp.getCount();
System.out.println("the hit size of search is: " + count);
long num_of_pages = count / 100;
long remainder = count % 100;
if(remainder != 0)
num_of_pages += 1;
for(int i =0; i < num_of_pages; i++)
{
int begin = i * 100;
SearchResponse searchResp = new SearchResponse();
if(i == (num_of_pages -1) && remainder != 0)
{
searchResp = qd.searchFromTo(qb, begin, (int)remainder, indexName);
}
else
searchResp = qd.searchFromTo(qb, begin, 100, indexName);
for (SearchHit hit : searchResp.getHits())
{
result.add(hit.getSource().get("stream_id").toString());
}
System.out.println("the size of the result is: " + result.size());
}
System.out.println("the total returned result for " + entity + "
size is: " + result.size());

QueryData is a class using to conduct all kinds of queries.

public class QueryData {
private Node node;
private Client client;
public QueryData()
{
node = nodeBuilder().clusterName("MSRA-KM").client(true).node();
client = node.client();
}
public void close()
{
client.close();
node.close();
}
public CountResponse countQuery(QueryBuilder qb, String indexName)
{
CountResponse countResponse = client.prepareCount(indexName)
.setTypes("kba")
.setQuery(qb)
.execute()
.actionGet();
return countResponse;
}
public SearchResponse searchFromTo(QueryBuilder qb, int from, int size,
String indexName)
{
SearchResponse searchResp = client.prepareSearch(indexName)
.setTypes("kba")
.setQuery(qb)
.setFrom(from).setSize(size).setExplain(true)
.execute()
.actionGet();
return searchResp;
}
}

F.Y.I. I'm conducting in a cluster with 3 indices all with same alias,
would it has some bad effect?
Thanks very much?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

By the way, I ever thought that this was caused by the timeout settings, so
I set the timeout value in actionGet method.
But in this situation, after I made some queris, there would be occur an
Exception: org.elasticsearch.ElasticSearchException: Timeout waiting for
task.
In this posthttps://groups.google.com/forum/?fromgroups=#!searchin/elasticsearch/$20org.elasticsearch.ElasticSearchTimeoutException:$20Timeout$20waiting$20for$20task.|sort:relevance/elasticsearch/bxUOxNG6dMs/7vPZAARXkrEJ,
kimchy said this was caused because of timeout settting.
On Tuesday, April 16, 2013 2:48:18 PM UTC+8, Jingang Wang wrote:

Hi all,

I have known that I could get the whole search results without score using
Scan/Scroll type, while I I want to obtain the whole search hits with score.

I used count query to get the number of search hits and then get them page
by page (100 records in every page).
While if the result size is large, say 5000, the program can not promise
obtain all the result hits, which is very wierd for me.
How could I resolve this problem and get the whole search hits set
successfully?
My code was attached as follows:
BoolQueryBuilder qb = QueryBuilders
.boolQuery()
.must(matchPhraseQuery("body_cleansed", entity));

            ArrayList<String> result = new ArrayList<String>();
            QueryData qd = new QueryData();
            CountResponse countResp = qd.countQuery(qb, indexName);

long count = countResp.getCount();
System.out.println("the hit size of search is: " + count);
long num_of_pages = count / 100;
long remainder = count % 100;
if(remainder != 0)
num_of_pages += 1;
for(int i =0; i < num_of_pages; i++)
{
int begin = i * 100;
SearchResponse searchResp = new SearchResponse();
if(i == (num_of_pages -1) && remainder != 0)
{
searchResp = qd.searchFromTo(qb, begin, (int)remainder, indexName);
}
else
searchResp = qd.searchFromTo(qb, begin, 100, indexName);
for (SearchHit hit : searchResp.getHits())
{
result.add(hit.getSource().get("stream_id").toString());
}
System.out.println("the size of the result is: " + result.size());
}
System.out.println("the total returned result for " + entity + "
size is: " + result.size());

QueryData is a class using to conduct all kinds of queries.

public class QueryData {
private Node node;
private Client client;
public QueryData()
{
node = nodeBuilder().clusterName("MSRA-KM").client(true).node();
client = node.client();
}
public void close()
{
client.close();
node.close();
}
public CountResponse countQuery(QueryBuilder qb, String indexName)
{
CountResponse countResponse = client.prepareCount(indexName)
.setTypes("kba")
.setQuery(qb)
.execute()
.actionGet();
return countResponse;
}
public SearchResponse searchFromTo(QueryBuilder qb, int from, int size,
String indexName)
{
SearchResponse searchResp = client.prepareSearch(indexName)
.setTypes("kba")
.setQuery(qb)
.setFrom(from).setSize(size).setExplain(true)
.execute()
.actionGet();
return searchResp;
}
}

F.Y.I. I'm conducting in a cluster with 3 indices all with same alias,
would it has some bad effect?
Thanks very much?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Your approach of iterating through the search result hits with setFrom()
and setSize() is limited by the available memory on the heap. If you use
"setFrom()", then all hits before the "from" position must be internally
computed. That is the reason why scan/scroll search has been introduced
as a better, more efficient alternative. Also you use
"setExplain(true)", which uses quite a bit of extra memory.

Jörg

Am 16.04.13 08:48, schrieb Jingang Wang:

While if the result size is large, say 5000, the program can not
promise obtain all the result hits, which is very wierd for me.
How could I resolve this problem and get the whole search hits set
successfully?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi Jörg,

Thanks for your reply. Do you know whether there is a better way to acquire
the whole search hits with score?

On Thursday, April 18, 2013 4:02:07 AM UTC+8, Jörg Prante wrote:

Your approach of iterating through the search result hits with setFrom()
and setSize() is limited by the available memory on the heap. If you use
"setFrom()", then all hits before the "from" position must be internally
computed. That is the reason why scan/scroll search has been introduced
as a better, more efficient alternative. Also you use
"setExplain(true)", which uses quite a bit of extra memory.

Jörg

Am 16.04.13 08:48, schrieb Jingang Wang:

While if the result size is large, say 5000, the program can not
promise obtain all the result hits, which is very wierd for me.
How could I resolve this problem and get the whole search hits set
successfully?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.