Thanks for the detailed reply.
I am a bit confused about and vs bool filter execution. I read this post
http://www.elasticsearch.org/blog/all-about-elasticsearch-filter-bitsets/ on
the elasticsearch blog. From that, I thought the bool filter would work by
basically creating a bitset for the entire segment(s) being examined. If
the filter value changes every time, will this still be cheaper than an AND
filter that will just examine the matching docs? My segments can be very
big and this query for example on matched one document.
There is no match_all query filter, There is a "match" query filter on a
field named "all".
Based on your feedback, I moved all filters, including the query filter,
into the bool filter. However it didn't change things: the query takes an
order of magnitude slower with the range filter, unless I set execution to
fielddata. I am using 1.2.2, I tried the strategy anyways and it didn't
make a difference.
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"bool": {
"must": [
{
"terms": {
"source_id": ["s1", "s2", "s3"]
}
},
{
"query": {
"match": {
"all": {
"query": "foo"
}
}
}
},
{
"range": {
"published": {
"to": 1406064191883
}
}
}
]
}
}
}
},
"sort": [
{
"crawlDate": {
"order": "desc"
}
}
]
}
On Wednesday, July 30, 2014 4:30:10 AM UTC-7, Clinton Gormley wrote:
Don't use the and
filter - use the bool
filter instead. They have
different execution modes and the bool
filter works best with bitset
filters (but also knows how to handle non-bitset filters like geo etc).
Just remove the and
, or
and not
filters from your DSL vocabulary.
Also, not sure why you are ANDing with a match_all filter - that doesn't
make much sense.
Depending on which version of ES you're using, you may be encountering a
bug in the filtered query which ended up always running the query first,
instead of the filter. This was fixed in v1.2.0
XFilteredQuery defaults to Query First strategy · Issue #6247 · elastic/elasticsearch · GitHub . If you are
on an earlier version you can force filter-first execution manually by
specifying a "strategy" of "random_access_100". See
Elasticsearch Platform — Find real-time answers at scale | Elastic
In summary, (and taking your less granular datetime clause into account)
your query would be better written as:
GET /_search
{
"query": {
"filtered": {
"strategy": "random_access_100", #### pre 1.2 only
"filter": {
"bool": {
"must": [
{
"terms": {
"source_id": [ "s1", "s2", "s3" ]
}
},
{
"range": {
"published": {
"gte": "now-1d/d" #### coarse grained, cached
}
}
},
{
"range": {
"published": {
"gte": "now-30m" #### fine grained, not cached,
could use fielddata too
},
"_cache": false
}
}
]
}
}
}
}
}
On 30 July 2014 10:55, David Pilato <da...@pilato.fr <javascript:>> wrote:
May be a stupid question: why did you put that filter inside a query and
not within the same filter you have at the end?
For my test case it's the same every time. In the "real" query it will
change every time, but I planned to not cache this filter and have a less
granular date filter in the bool filter that would be cached. However while
debugging I noticed slowness with the date range filters even while testing
with the same value repeatedly.
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/af76ca41-9045-4a4f-b82c-b9c86d964ace%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/af76ca41-9045-4a4f-b82c-b9c86d964ace%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/085e268b-348a-4237-98f4-1c4dd56f7be1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.