Can the order of filters impact performance?

I've got simple term query with two filters. One filter is a date range
filter and the other is a terms filter for handling ACLs. The ACL filter,
on average, has about 1000 terms so it's heavy. I suspect that running the
date range filter first can significantly restrict the document set so that
the ACL filter runs more efficiently (given the smaller subset).

I don't believe Lucene does any query optimization to handle this. Is there
any way to guarantee the order of filters? Are my assumptions correct that
the order of fliters can impact performance?

Thanks,
-Eric

I would also really like to know the answer to this question.

Additionally, I'm wondering if the query can be run before the filters or
vice versa? Will this impact performance? Does Elasticsearch have built
in logic to optimize queries independent of their order in a data request? If
we can control the order in which pieces of a query / filter are executed
and they do impact performance, then please give an implementation example.

On Monday, July 9, 2012 9:11:33 AM UTC-7, egaumer wrote:

I've got simple term query with two filters. One filter is a date range
filter and the other is a terms filter for handling ACLs. The ACL filter,
on average, has about 1000 terms so it's heavy. I suspect that running the
date range filter first can significantly restrict the document set so that
the ACL filter runs more efficiently (given the smaller subset).

I don't believe Lucene does any query optimization to handle this. Is
there any way to guarantee the order of filters? Are my assumptions correct
that the order of fliters can impact performance?

Thanks,
-Eric

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

On Thu, 2013-02-07 at 21:47 -0800, Brian Jones wrote:

I would also really like to know the answer to this question.

Additionally, I'm wondering if the query can be run before the filters
or vice versa? Will this impact performance? Does Elasticsearch have
built in logic to optimize queries independent of their order in a
data request? If we can control the order in which pieces of a
query / filter are executed and they do impact performance, then
please give an implementation example.

Filters are executed in the order they are passed in to an and/or or
must/should clause. must clauses are executed before should clauses
(this goes for filters and queries)

also, in the next version of ES, "cheap" (ie bitset) "should" filter
clauses are executed before the more expensive filter clauses (eg
geo-distance).

in a filtered query, i believe the filter and query are executed
together, ie filter->query->filter->query etc and in the next version,
you'll be able to control the order of execution.

in the search API, "filter" is executed after "query", (and after
facets).

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

One more thing to add to clint answer, terms filter and date range internally always execute the "full" filter and end up being represented as a bitset. Even with 100 terms in a terms filter, this should be fast, and filter caching, specifically for ACL type logic, is nicely cached. One thing that I would add, is use _cache_key for the ACL filter, so the big list of terms won't be used as the relevant filter cache key, the _cache_key can be something like _user_id_1122_acl.

On Feb 8, 2013, at 11:19 AM, Clinton Gormley clint@traveljury.com wrote:

On Thu, 2013-02-07 at 21:47 -0800, Brian Jones wrote:

I would also really like to know the answer to this question.

Additionally, I'm wondering if the query can be run before the filters
or vice versa? Will this impact performance? Does Elasticsearch have
built in logic to optimize queries independent of their order in a
data request? If we can control the order in which pieces of a
query / filter are executed and they do impact performance, then
please give an implementation example.

Filters are executed in the order they are passed in to an and/or or
must/should clause. must clauses are executed before should clauses
(this goes for filters and queries)

also, in the next version of ES, "cheap" (ie bitset) "should" filter
clauses are executed before the more expensive filter clauses (eg
geo-distance).

in a filtered query, i believe the filter and query are executed
together, ie filter->query->filter->query etc and in the next version,
you'll be able to control the order of execution.

in the search API, "filter" is executed after "query", (and after
facets).

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

We did a fair amount of performance testing around permission filters and
the bitsets (i.e., terms filter) perform really well. We expected issues
given some past experience with a proprietary search product but this
wasn't the case with elasticsearch and we used filters ranging from 8K to
10K unique permissions.

I roughly recall the first query averaging around 150ms with subsequent
(cached filters) queries averaging ~30ms (250K unique queries - 2 node
cluster w/ 1 shard (~300GB) 1 replica - SSD - 128GB RAM - 8GB heap - 10GigE

  • no faceting and no sorting - basic boolean searches). We were mainly
    interested in how these large terms filters effected performance.

Queries looked roughly like the following...

{
    "query": {
        "filtered": {
            "query": {
                "bool": {
                    "should": [
                        {"field": {"headline": "Growth"}},
                        {"field": {"text": "Growth"}}
                    ]
                }
            },
            "filter": {
                "and": [{
                    "numeric_range": {
                        "date": {"from": "2000-12-17T06:55:00Z", "to": 

"2001-12-17T06:55:00Z"}
}
},{
"terms": {
"perms": [70008497, 70008496, 70008495, ...,
70000166, 70000170, 70000002]
}
}]
}
}
}
}

We didn't use a cache key but that would obviously help reduce cache sizes.
These were very contrived tests.

On Tuesday, February 12, 2013 5:40:45 PM UTC-5, kimchy wrote:

One more thing to add to clint answer, terms filter and date range
internally always execute the "full" filter and end up being represented as
a bitset. Even with 100 terms in a terms filter, this should be fast, and
filter caching, specifically for ACL type logic, is nicely cached. One
thing that I would add, is use _cache_key for the ACL filter, so the big
list of terms won't be used as the relevant filter cache key, the
_cache_key can be something like _user_id_1122_acl.

On Feb 8, 2013, at 11:19 AM, Clinton Gormley <cl...@traveljury.com<javascript:>>
wrote:

On Thu, 2013-02-07 at 21:47 -0800, Brian Jones wrote:

I would also really like to know the answer to this question.

Additionally, I'm wondering if the query can be run before the filters
or vice versa? Will this impact performance? Does Elasticsearch have
built in logic to optimize queries independent of their order in a
data request? If we can control the order in which pieces of a
query / filter are executed and they do impact performance, then
please give an implementation example.

Filters are executed in the order they are passed in to an and/or or
must/should clause. must clauses are executed before should clauses
(this goes for filters and queries)

also, in the next version of ES, "cheap" (ie bitset) "should" filter
clauses are executed before the more expensive filter clauses (eg
geo-distance).

in a filtered query, i believe the filter and query are executed
together, ie filter->query->filter->query etc and in the next version,
you'll be able to control the order of execution.

in the search API, "filter" is executed after "query", (and after
facets).

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.