I noticed that there is a count API, that is basically a query without
scoring, but only count the #hits.
Now I need to know whether a query has results or not.
Of course I can do that by doing a count and then test of 0 results, but
that is probably not the most efficient approach.
Also the exists filter seems no to solve this problem.
Executing a query, plus the network latency, will be far greater than any
time used on the client side to see the count is 0 or not. The query needs
to be executed regardless to determine if it will return a result. I would
simply use the count API.
--
Ivan
On Mon, Sep 23, 2013 at 6:22 AM, Peter van der Weerd pw2@bitmanager.nlwrote:
Hi All,
I noticed that there is a count API, that is basically a query without
scoring, but only count the #hits.
Now I need to know whether a query has results or not.
Of course I can do that by doing a count and then test of 0 results, but
that is probably not the most efficient approach.
Also the exists filter seems no to solve this problem.
At the moment, there is no option to just check if query matches any
documents other then using the count API or the count search type and
checking #hits > 0. In Lucene you could use early termination
(CollectionTerminatedException) to stop enumerating docs when the first
document is found - so we can do something there.
If the #hits>0 is not fast enough for you, I suggest you open an issue,
including and example of your use case, the timing involved and why they
are not good enough for you.
Of course, you can also have some fun writing a plugin/a pull request with
an extra endpoint (modelled after the Count API) and see how much that
helps. Shouldn't be too hard and is a nice exercise
Cheers,
Boaz
On Monday, September 23, 2013 3:22:20 PM UTC+2, Peter van der Weerd wrote:
Hi All,
I noticed that there is a count API, that is basically a query without
scoring, but only count the #hits.
Now I need to know whether a query has results or not.
Of course I can do that by doing a count and then test of 0 results, but
that is probably not the most efficient approach.
Also the exists filter seems no to solve this problem.
Even when a single document is requested, all matching documents will be
collected in order to count the number of matches. The count API can be
expected to be slightly faster than collecting a single document since
Elasticsearch will be able to use a specialized collector that only counts
matches and doesn't care about scoring or field values.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.