Confused about Java API GetRequest and SearchRequest

Hi all,

we will store searchable data into the _source field. So there isn't stored
a classic known document like pdf or so. Additionally we will store the
database primary key in the _scource field like this:

_source{ "searchfield1":"value1","searchfield2":"value2","dbkey":"4711"}

If we'll get the dbkeys from ES search by a query to process some other
stuff, is it the preferred method to do the following process:

  1. send a CountRequest to get the "count" of the searchquery,
  2. after this send a SearchRequest with the same query and with
    SearchType.SCAN and setSize(count) to receive the ids,
  3. then send a GetRequest in a id based for-loop to get all documents
    (_source fields) to receive the dbkey field

I think there must be another approach. Can you help me?

some code for better understanding:

....
CountResponse countResp = esClient.prepareCount(index).setTypes(types)
.setQuery(QueryBuilders.queryString(luceneNativeQuery)).execute().actionGet();
int total = ((Long) countResp.getCount()).intValue();

SearchResponse searchResponse =
esClient.prepareSearch(index).setTypes(types).setQuery(QueryBuilders.queryString(luceneNativeQuery)).setSize(total).setSearchType(SearchType.SCAN).execute().actionGet();

for (SearchHit hit : searchResponse.getHits().getHits()) {
GetRequest getRequest = new GetRequest(index, hit.getType(), hit.getId());
GetResponse getResponse = esClient.get(getRequest).actionGet();

if (!getResponse.getSource().isEmpty()) {
DataReference dataReference =
(DataReference) getResponse.getSource().get(DATA_REFERENCE);
searchResults.add(dataReference);
}
}
....

Best regards and many thanks in advance
Alex

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

The idea behind the scan search type is to fetch a big amount of results,
as you need to as far as I understood.
But in order to do that, you don't need the count request, nor to set the
size of the scan request to the total number of documents to be retrieved.
You just initialize the scan with a reasonable size (remember it's per
shard), like 10, or 50. Then you get back a scroll id which you can use to
scroll your results and get back the first size*num_shards, then the next
batch, and so on, till you get back no documents, which means that you
fetched all your documents.

In the search response you should already find the _source too, thus no
need to use the get api either.

You can find an example using the java api
here: Elasticsearch Platform — Find real-time answers at scale | Elastic
.

Makes sense?

On Thursday, October 17, 2013 5:37:52 PM UTC+2, Alex wrote:

Hi all,

we will store searchable data into the _source field. So there isn't
stored a classic known document like pdf or so. Additionally we will
store the database primary key in the _scource field like this:

_source{ "searchfield1":"value1","searchfield2":"value2","dbkey":"4711"}

If we'll get the dbkeys from ES search by a query to process some other
stuff, is it the preferred method to do the following process:

  1. send a CountRequest to get the "count" of the searchquery,
  2. after this send a SearchRequest with the same query and with
    SearchType.SCAN and setSize(count) to receive the ids,
  3. then send a GetRequest in a id based for-loop to get all documents
    (_source fields) to receive the dbkey field

I think there must be another approach. Can you help me?

some code for better understanding:

....
CountResponse countResp = esClient.prepareCount(index).setTypes(types)
.setQuery(QueryBuilders.queryString(luceneNativeQuery)).execute().actionGet();
int total = ((Long) countResp.getCount()).intValue();

SearchResponse searchResponse =
esClient.prepareSearch(index).setTypes(types).setQuery(QueryBuilders.queryString(luceneNativeQuery)).setSize(total).setSearchType(SearchType.SCAN).execute().actionGet();

for (SearchHit hit : searchResponse.getHits().getHits()) {
GetRequest getRequest = new GetRequest(index, hit.getType(), hit.getId());
GetResponse getResponse = esClient.get(getRequest).actionGet();

if (!getResponse.getSource().isEmpty()) {
DataReference dataReference =
(DataReference) getResponse.getSource().get(DATA_REFERENCE);
searchResults.add(dataReference);
}
}
....

Best regards and many thanks in advance
Alex

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi Luca, thanks for the introduction!
I'll try the scrolling approach.

Am Donnerstag, 17. Oktober 2013 20:46:32 UTC+2 schrieb Luca Cavanna:

The idea behind the scan search type is to fetch a big amount of results,
as you need to as far as I understood.
But in order to do that, you don't need the count request, nor to set the
size of the scan request to the total number of documents to be retrieved.
You just initialize the scan with a reasonable size (remember it's per
shard), like 10, or 50. Then you get back a scroll id which you can use to
scroll your results and get back the first size*num_shards, then the next
batch, and so on, till you get back no documents, which means that you
fetched all your documents.

In the search response you should already find the _source too, thus no
need to use the get api either.

You can find an example using the java api here:
Elasticsearch Platform — Find real-time answers at scale | Elastic.

Makes sense?

On Thursday, October 17, 2013 5:37:52 PM UTC+2, Alex wrote:

Hi all,

we will store searchable data into the _source field. So there isn't
stored a classic known document like pdf or so. Additionally we will
store the database primary key in the _scource field like this:

_source{ "searchfield1":"value1","searchfield2":"value2","dbkey":"4711"}

If we'll get the dbkeys from ES search by a query to process some other
stuff, is it the preferred method to do the following process:

  1. send a CountRequest to get the "count" of the searchquery,
  2. after this send a SearchRequest with the same query and with
    SearchType.SCAN and setSize(count) to receive the ids,
  3. then send a GetRequest in a id based for-loop to get all documents
    (_source fields) to receive the dbkey field

I think there must be another approach. Can you help me?

some code for better understanding:

....
CountResponse countResp = esClient.prepareCount(index).setTypes(types)
.setQuery(QueryBuilders.queryString(luceneNativeQuery)).execute().actionGet();
int total = ((Long) countResp.getCount()).intValue();

SearchResponse searchResponse =
esClient.prepareSearch(index).setTypes(types).setQuery(QueryBuilders.queryString(luceneNativeQuery)).setSize(total).setSearchType(SearchType.SCAN).execute().actionGet();

for (SearchHit hit : searchResponse.getHits().getHits()) {
GetRequest getRequest = new GetRequest(index, hit.getType(), hit.getId());
GetResponse getResponse = esClient.get(getRequest).actionGet();

if (!getResponse.getSource().isEmpty()) {
DataReference dataReference =
(DataReference) getResponse.getSource().get(DATA_REFERENCE);
searchResults.add(dataReference);
}
}
....

Best regards and many thanks in advance
Alex

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.