TermFilter as part of OR Filter - Caching


(aditya tripathi) #1

Hi,
I had a quick question.

In our app we use lot of term filters as part of OR filters. Term Filters
are automatically cached. Are they also cached when they are part of an OR
Filter.

I observe a sharp rise in filter cache size when users where for whom these
OR filter will have lot of term filters log in to the system.

-Thanks.

--
View this message in context: http://elasticsearch-users.115913.n3.nabble.com/TermFilter-as-part-of-OR-Filter-Caching-tp3940555p3940555.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.


(Clinton Gormley) #2

Hiya

In our app we use lot of term filters as part of OR filters. Term Filters
are automatically cached. Are they also cached when they are part of an OR
Filter.

No, not unless you specify that the OR filter should be cached.

curl -XGET 'http://127.0.0.1:9200/_all/_search?pretty=1' -d '
{
"query" : {
"constant_score" : {
"filter" : {
"or" : {
"_cache" : 1,
"filters" : [
{
"term" : {
"bar" : 2
}
},
{
"term" : {
"foo" : 1
}
}
]
}
}
}
}
}
'

You may also try setting the individual term filters to NOT be cached:

curl -XGET 'http://127.0.0.1:9200/_all/_search?pretty=1' -d '
{
"query" : {
"constant_score" : {
"filter" : {
"or" : {
"_cache" : 1,
"filters" : [
{
"term" : {
"_cache" : 0,
"foo" : 1
}
},
{
"term" : {
"_cache" : 0,
"bar" : 2
}
}
]
}
}
}
}
}
'

Let us know how this works out - I'd be interested to see.

clint


(aditya tripathi) #3

You are right - You have to explicitly set _cache=true for OR Filters to get cached.
And all the Term Filters in that OR Filter are cached by default unless explicitly set to false.


(Shay Banon) #4

Also, it might make sense to use terms filter, also, if you have
combination of term/terms/range filters, it might make sense to use bool
filter (which does bitwise operations) compared to or filter. It will
probably improve perf.

On Fri, Apr 27, 2012 at 1:30 PM, Clinton Gormley clint@traveljury.comwrote:

Hiya

In our app we use lot of term filters as part of OR filters. Term Filters
are automatically cached. Are they also cached when they are part of an
OR
Filter.

No, not unless you specify that the OR filter should be cached.

curl -XGET 'http://127.0.0.1:9200/_all/_search?pretty=1' -d '
{
"query" : {
"constant_score" : {
"filter" : {
"or" : {
"_cache" : 1,
"filters" : [
{
"term" : {
"bar" : 2
}
},
{
"term" : {
"foo" : 1
}
}
]
}
}
}
}
}
'

You may also try setting the individual term filters to NOT be cached:

curl -XGET 'http://127.0.0.1:9200/_all/_search?pretty=1' -d '
{
"query" : {
"constant_score" : {
"filter" : {
"or" : {
"_cache" : 1,
"filters" : [
{
"term" : {
"_cache" : 0,
"foo" : 1
}
},
{
"term" : {
"_cache" : 0,
"bar" : 2
}
}
]
}
}
}
}
}
'

Let us know how this works out - I'd be interested to see.

clint


(Tim J) #5

Shay,
I'm a bit confused about the performance of bool vs. and/or
filters. In your previous post you mentioned that it might make sense
to try a bool filter as it would be more performant than or. The docs
for and/or filters (http://www.elasticsearch.org/guide/reference/query-
dsl/and-filter.html) seem to say the opposite. Could you clarify?

Thanks,
-Tim

On Apr 29, 12:37 pm, Shay Banon kim...@gmail.com wrote:

Also, it might make sense to use terms filter, also, if you have
combination of term/terms/range filters, it might make sense to use bool
filter (which does bitwise operations) compared to or filter. It will
probably improve perf.

On Fri, Apr 27, 2012 at 1:30 PM, Clinton Gormley cl...@traveljury.comwrote:

Hiya

In our app we use lot of term filters as part of OR filters. Term Filters
are automatically cached. Are they also cached when they are part of an
OR
Filter.

No, not unless you specify that the OR filter should be cached.

curl -XGET 'http://127.0.0.1:9200/_all/_search?pretty=1' -d '
{
"query" : {
"constant_score" : {
"filter" : {
"or" : {
"_cache" : 1,
"filters" : [
{
"term" : {
"bar" : 2
}
},
{
"term" : {
"foo" : 1
}
}
]
}
}
}
}
}
'

You may also try setting the individual term filters to NOT be cached:

curl -XGET 'http://127.0.0.1:9200/_all/_search?pretty=1' -d '
{
"query" : {
"constant_score" : {
"filter" : {
"or" : {
"_cache" : 1,
"filters" : [
{
"term" : {
"_cache" : 0,
"foo" : 1
}
},
{
"term" : {
"_cache" : 0,
"bar" : 2
}
}
]
}
}
}
}
}
'

Let us know how this works out - I'd be interested to see.

clint


(Clinton Gormley) #6

On Thu, 2012-05-24 at 09:22 -0700, Tim J wrote:

Shay,
I'm a bit confused about the performance of bool vs. and/or
filters. In your previous post you mentioned that it might make sense
to try a bool filter as it would be more performant than or. The docs
for and/or filters (http://www.elasticsearch.org/guide/reference/query-
dsl/and-filter.html) seem to say the opposite. Could you clarify?

As I understand it:

  • an 'or' filter is not cached. instead, the individual clauses might
    be cached
  • a 'bool' filter IS cached.

So, for the two clauses (status = 'active') and (tag = 'foo'):

  • if you always use them together, then combine them with a 'bool'
    filter
  • if you use each clause often, but independently - eg perhaps you
    always use (status = 'active') but you combine it with many version
    of (tag = $tag) - then rather use an 'or' filter

Similarly, for the 'terms' filter, if you always query
(tag = foo or tag=bar) together, then use the 'bool' execution.

If you have lots of combinations (eg (tag=foo), (tag=bar), (tag=foo or
tag=bar), (tag=foo or tag=baz) etc) then use the 'plain' (or) execution.

clint


(Clinton Gormley) #7

As I understand it:

  • an 'or' filter is not cached. instead, the individual clauses might
    be cached
  • a 'bool' filter IS cached.

Apparently, the last line above is incorrect. From the docs:

    The result of the bool filter is not cached by default (though
    internal filters might be). The _cache can be set to true in
    order to enable caching.

So @kimchy: i'm a bit confused as well.


(Shay Banon) #8

The way filters work (most of them, like range, terms, and any cached one)
is that it creates a bitset with "on"/"off" for each document matching. The
bool filter works by doing bitwise operation on that bitset. and/or/not
work by being part of the iteration process over matching docs, doing it
"on the fly" for a document.

usually, for filters that already have a fixed bitset representation, it
makes sense to use bool filter (those include term, terms, range, and
cached filters). Ones that don't, and compute the filter per doc (like the
geo ones), it makes sense to use or/and/not filter.

On Thu, May 24, 2012 at 6:46 PM, Clinton Gormley clint@traveljury.comwrote:

As I understand it:

  • an 'or' filter is not cached. instead, the individual clauses might
    be cached
  • a 'bool' filter IS cached.

Apparently, the last line above is incorrect. From the docs:

   The result of the bool filter is not cached by default (though
   internal filters might be). The _cache can be set to true in
   order to enable caching.

So @kimchy: i'm a bit confused as well.


(system) #9