I realize "limit" is not a limit for response size. I'm actually ok with
getting more than one result. I'm actually not relying on limit for a size.
I often use size in conjunction with limit. I'll do this when I really
don't care how many items I get back, as long as it is within a range. But
I implement the limit to help decrease the load on the shards.
That said, I need to understand what expectations I can have around limit.
Is it completely non-deterministic? Or can I have reasonable expectations
about it?
I will propose an example and describe my expectations:
Node setup:
1 index
1 mapping
5 shards
1,000,000 documents sharded across the 5 shards
1000 matching documents sharded across the 5 shards
let's assume normal distribution of the matching documents: 200 documents
per shard. I realize this is not realistic to get an exact distribution
like this.
If I place a limit of 5 on the query, I expect 25 documents back. That is,
I get 5 documents from each node. I expect this because I have at least 5
matching documents per shard. In fact, I have many more than 5 matching
documents per shard. But I expect the limit to return five documents from
each shard.
Now I realize there are lots of real world circumstance that would cause
the query to return fewer than 25 documents. Let's ignore those for the
time being and remain under the assumption that the distribution is even.
Now, if I place a limit of 1 on the query, I expect 5 documents back.
Are these two expectations correct?
Now let's assume a worst case scenario: all of the matching documents are
on one shard. A limit of 5 should still return 5 documents. A limit of 1
should return 1 document.
If these expectations are true, then my original scenario is valid and a
limit of 1 should still return 1 document.
So are these expectations valid? Or is limit completely non-deterministic?
Size does work, but if I can improve performance with a limit, I would like
to do so. It is possible that I have tens of thousands of matching
documents, and limit could be an excellent short-circuit. Basically I want
the shard to stop searching as soon as it has found one document.
Also, I don't have the document _id so I cannot make the HEAD call.
Do these clarifications help?
On Wednesday, October 22, 2014 3:57:25 PM UTC-4, Jörg Prante wrote:
"limit" is not a limit for response size. It sets a shard limit which is
quite low level, so the resources per shard of ES are not so much under
pressure. If the sum of the limits on the shards matches the total length
of the response is not guaranteed.
The limit parameter for the response is the "size" parameter. Can you try
POST profiles/profile/_search
{
"size" : 1,
"query": {
"constant_score" : {
"filter" : {
"term": {
"profile_id": "salinger-23145"
}
}
}
}
}
and see if this works better?
If you want to perform a true existence check of a doc, you should use the
doc _id and a head request, something like
HEAD profiles/profile/id
which is faster than a search.
Jörg
On Wed, Oct 22, 2014 at 8:58 PM, Jeff Gandt <jeff....@gmail.com
<javascript:>> wrote:
I have a query that I want to return only one document. Basically, I want
to do an existence check on a document with a given term filter.
I am executing:
POST profiles/profile/_search
{
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"limit": {
"value": 1
}
},
{
"term": {
"profile_id": "salinger-23145"
}
}
]
}
}
}
}
}
The profiles/profile mapping has tens of millions of documents in it, two
of which match the given terms query (when the limit is removed entirely).
When I execute the query, I get zero results back. However, If I change
the limit value to two (2) then one (1) result is returned. If I change the
limit value to three (3) then two (2) results are returned. It's almost
like there is an off by one error in limit.
So am I:
-
Writing the query wrong?
I tried placing the limit outside of the must, bool, and filter clauses.
That caused errors in each case. But I may have just done something silly.
-
Misunderstanding limit?
My understanding of limit is that it returns no more than x documents per
shard. Given that I have five shards and at least two documents matching
the query, I should be returning between one and five documents. However,
looking at the limit documentation
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-limit-filter.html I
suspect that I may be misunderstanding how limit works. The wording "to
execute on" leads me to believe that it may only be selecting ONE document
against which the term filter is run. Thus, if the one document that it
tests doesn't match, it returns zero results. However, the limit 2
returning one document leads me to believe that my original understanding
is correct.
-
Staring at an elasticsearch limit bug?
Unfortunately I have been unable to reproduce the error after creating
test indexes and mappings. The limit behaves exactly as I expect in every
other case.
-
Doing something else that is equally silly?
Any help or suggestions is appreciated. Can I provide any clarifications?
Thanks,
.jpg
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/86988787-fb31-4b5a-b570-427750177ecd%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/86988787-fb31-4b5a-b570-427750177ecd%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/15814fa7-fc46-4a70-9a2d-f18123b7b1ff%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.