In our app we use lot of term filters as part of OR filters. Term Filters
are automatically cached. Are they also cached when they are part of an OR
Filter.
I observe a sharp rise in filter cache size when users where for whom these
OR filter will have lot of term filters log in to the system.
In our app we use lot of term filters as part of OR filters. Term Filters
are automatically cached. Are they also cached when they are part of an OR
Filter.
No, not unless you specify that the OR filter should be cached.
You are right - You have to explicitly set _cache=true for OR Filters to get cached.
And all the Term Filters in that OR Filter are cached by default unless explicitly set to false.
Also, it might make sense to use terms filter, also, if you have
combination of term/terms/range filters, it might make sense to use bool
filter (which does bitwise operations) compared to or filter. It will
probably improve perf.
In our app we use lot of term filters as part of OR filters. Term Filters
are automatically cached. Are they also cached when they are part of an
OR
Filter.
No, not unless you specify that the OR filter should be cached.
Shay,
I'm a bit confused about the performance of bool vs. and/or
filters. In your previous post you mentioned that it might make sense
to try a bool filter as it would be more performant than or. The docs
for and/or filters (Elasticsearch Platform — Find real-time answers at scale | Elastic
dsl/and-filter.html) seem to say the opposite. Could you clarify?
Also, it might make sense to use terms filter, also, if you have
combination of term/terms/range filters, it might make sense to use bool
filter (which does bitwise operations) compared to or filter. It will
probably improve perf.
In our app we use lot of term filters as part of OR filters. Term Filters
are automatically cached. Are they also cached when they are part of an
OR
Filter.
No, not unless you specify that the OR filter should be cached.
Shay,
I'm a bit confused about the performance of bool vs. and/or
filters. In your previous post you mentioned that it might make sense
to try a bool filter as it would be more performant than or. The docs
for and/or filters (Elasticsearch Platform — Find real-time answers at scale | Elastic
dsl/and-filter.html) seem to say the opposite. Could you clarify?
As I understand it:
an 'or' filter is not cached. instead, the individual clauses might
be cached
a 'bool' filter IS cached.
So, for the two clauses (status = 'active') and (tag = 'foo'):
if you always use them together, then combine them with a 'bool'
filter
if you use each clause often, but independently - eg perhaps you
always use (status = 'active') but you combine it with many version
of (tag = $tag) - then rather use an 'or' filter
Similarly, for the 'terms' filter, if you always query
(tag = foo or tag=bar) together, then use the 'bool' execution.
If you have lots of combinations (eg (tag=foo), (tag=bar), (tag=foo or
tag=bar), (tag=foo or tag=baz) etc) then use the 'plain' (or) execution.
The way filters work (most of them, like range, terms, and any cached one)
is that it creates a bitset with "on"/"off" for each document matching. The
bool filter works by doing bitwise operation on that bitset. and/or/not
work by being part of the iteration process over matching docs, doing it
"on the fly" for a document.
usually, for filters that already have a fixed bitset representation, it
makes sense to use bool filter (those include term, terms, range, and
cached filters). Ones that don't, and compute the filter per doc (like the
geo ones), it makes sense to use or/and/not filter.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.