Enabling filter cache

Ed_Kim · April 22, 2015, 6:41pm

Hi, I have a dynamic query built via java api that assembles a filtered
query depending on the parameter input. I have about a dozen filters
(mostly term filters) that may or may not be used, and had a couple
questions:

Is it ok to simply set the parent boolFilterBuilder cache setting to
true, or do I need to set cache=true for each filter?
Would it be a good practice to execute a dummy query with all the
filters to preemptively create the filter before it's released for actual
use?

Thanks in advance for your time

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/af0fc6a1-4951-4c83-9642-cb6b12e3e56f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

nik9000 · April 22, 2015, 6:53pm

On Wed, Apr 22, 2015 at 2:41 PM, Ed Kim edkim81@gmail.com wrote:

Hi, I have a dynamic query built via java api that assembles a filtered
query depending on the parameter input. I have about a dozen filters
(mostly term filters) that may or may not be used, and had a couple
questions:

Is it ok to simply set the parent boolFilterBuilder cache setting to
true, or do I need to set cache=true for each filter?

Those do different things. One caches the combined results and one caches
each term. To be honest term filter are rarely worth caching because just
hitting lucene for them is so fast.

Would it be a good practice to execute a dummy query with all the

filters to preemptively create the filter before it's released for actual
use?

This is what warmers are for. They are applied to new segments to eagerly
load stuff including the filter cache. Elasticsearch's filer cache is per
segment so this is a good match. Its also why the filter cache doesn't have
to be invalidated - segments are write only and deletes are applied after
the results from the filter cache.

Nik

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPmjWd0VNMNL3MZQ%2BAHn5EH4N%3DQRpqjS9UqQMEZ-yY82C4-ozA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Ed_Kim · April 22, 2015, 6:58pm

In terms of performance, we we talking nanoseconds saved by using term
filters, or possibly a few milliseconds? Given the performance requirements
for this query, even saving a few milliseconds is a lot. Also, it looks
like I should cache at the individual filter level, as they will be bundled
differently depending on the params. Thanks for the clarification!

On Wed, Apr 22, 2015 at 11:53 AM, Nikolas Everett nik9000@gmail.com wrote:

On Wed, Apr 22, 2015 at 2:41 PM, Ed Kim edkim81@gmail.com wrote:

Hi, I have a dynamic query built via java api that assembles a filtered
query depending on the parameter input. I have about a dozen filters
(mostly term filters) that may or may not be used, and had a couple
questions:

Is it ok to simply set the parent boolFilterBuilder cache setting to
true, or do I need to set cache=true for each filter?

Those do different things. One caches the combined results and one caches
each term. To be honest term filter are rarely worth caching because just
hitting lucene for them is so fast.

Would it be a good practice to execute a dummy query with all the

filters to preemptively create the filter before it's released for actual
use?

This is what warmers are for. They are applied to new segments to eagerly
load stuff including the filter cache. Elasticsearch's filer cache is per
segment so this is a good match. Its also why the filter cache doesn't have
to be invalidated - segments are write only and deletes are applied after
the results from the filter cache.

Nik

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/nVQyRc-AKDM/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAPmjWd0VNMNL3MZQ%2BAHn5EH4N%3DQRpqjS9UqQMEZ-yY82C4-ozA%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAPmjWd0VNMNL3MZQ%2BAHn5EH4N%3DQRpqjS9UqQMEZ-yY82C4-ozA%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPh7Qf%3D%2BwhtG%3D21SRVeE-XdEbMXoHK1%3D8tKikp2ken8uza4z6g%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

nik9000 · April 22, 2015, 7:37pm

With term queries I imagine its nanoseconds to a net loss to use the filter
cache. You should really test it though because I'm not 100% sure.

There was talk of elassticsearch being more intelligent about which filters
it decides to cache but I don't know where that's gone.

Nik

On Wed, Apr 22, 2015 at 2:58 PM, Eddie Kim edkim81@gmail.com wrote:

In terms of performance, we we talking nanoseconds saved by using term
filters, or possibly a few milliseconds? Given the performance requirements
for this query, even saving a few milliseconds is a lot. Also, it looks
like I should cache at the individual filter level, as they will be bundled
differently depending on the params. Thanks for the clarification!

On Wed, Apr 22, 2015 at 11:53 AM, Nikolas Everett nik9000@gmail.com
wrote:

On Wed, Apr 22, 2015 at 2:41 PM, Ed Kim edkim81@gmail.com wrote:

Hi, I have a dynamic query built via java api that assembles a filtered
query depending on the parameter input. I have about a dozen filters
(mostly term filters) that may or may not be used, and had a couple
questions:

Is it ok to simply set the parent boolFilterBuilder cache setting to
true, or do I need to set cache=true for each filter?

Those do different things. One caches the combined results and one caches
each term. To be honest term filter are rarely worth caching because just
hitting lucene for them is so fast.

Would it be a good practice to execute a dummy query with all the

filters to preemptively create the filter before it's released for actual
use?

This is what warmers are for. They are applied to new segments to eagerly
load stuff including the filter cache. Elasticsearch's filer cache is per
segment so this is a good match. Its also why the filter cache doesn't have
to be invalidated - segments are write only and deletes are applied after
the results from the filter cache.

Nik

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/nVQyRc-AKDM/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAPmjWd0VNMNL3MZQ%2BAHn5EH4N%3DQRpqjS9UqQMEZ-yY82C4-ozA%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAPmjWd0VNMNL3MZQ%2BAHn5EH4N%3DQRpqjS9UqQMEZ-yY82C4-ozA%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAPh7Qf%3D%2BwhtG%3D21SRVeE-XdEbMXoHK1%3D8tKikp2ken8uza4z6g%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAPh7Qf%3D%2BwhtG%3D21SRVeE-XdEbMXoHK1%3D8tKikp2ken8uza4z6g%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPmjWd0u3aX_oOu2v_0WcPAygDPWba%3DCBhe-_oRinS8BjEQpkQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Ed_Kim · April 22, 2015, 10:15pm

I'm a bit confused, is terms filter slower because it has to iterate
through a list of bitsets whereas lucene already has access to the list of
matching documents via inverted index?

Also, if I set cache=true for each individual filter, does it allow me to
create any permutation of my bool filter (given a set of filters) and make
use of the cache? Or will this create a new filter cache?

On Wed, Apr 22, 2015 at 12:37 PM, Nikolas Everett nik9000@gmail.com wrote:

With term queries I imagine its nanoseconds to a net loss to use the
filter cache. You should really test it though because I'm not 100% sure.

There was talk of elassticsearch being more intelligent about which
filters it decides to cache but I don't know where that's gone.

Nik

On Wed, Apr 22, 2015 at 2:58 PM, Eddie Kim edkim81@gmail.com wrote:

In terms of performance, we we talking nanoseconds saved by using term
filters, or possibly a few milliseconds? Given the performance requirements
for this query, even saving a few milliseconds is a lot. Also, it looks
like I should cache at the individual filter level, as they will be bundled
differently depending on the params. Thanks for the clarification!

On Wed, Apr 22, 2015 at 11:53 AM, Nikolas Everett nik9000@gmail.com
wrote:

On Wed, Apr 22, 2015 at 2:41 PM, Ed Kim edkim81@gmail.com wrote:

Hi, I have a dynamic query built via java api that assembles a filtered
query depending on the parameter input. I have about a dozen filters
(mostly term filters) that may or may not be used, and had a couple
questions:

Is it ok to simply set the parent boolFilterBuilder cache setting to
true, or do I need to set cache=true for each filter?

Those do different things. One caches the combined results and one
caches each term. To be honest term filter are rarely worth caching because
just hitting lucene for them is so fast.

Would it be a good practice to execute a dummy query with all the

filters to preemptively create the filter before it's released for actual
use?

This is what warmers are for. They are applied to new segments to
eagerly load stuff including the filter cache. Elasticsearch's filer cache
is per segment so this is a good match. Its also why the filter cache
doesn't have to be invalidated - segments are write only and deletes are
applied after the results from the filter cache.

Nik

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/nVQyRc-AKDM/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAPmjWd0VNMNL3MZQ%2BAHn5EH4N%3DQRpqjS9UqQMEZ-yY82C4-ozA%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAPmjWd0VNMNL3MZQ%2BAHn5EH4N%3DQRpqjS9UqQMEZ-yY82C4-ozA%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAPh7Qf%3D%2BwhtG%3D21SRVeE-XdEbMXoHK1%3D8tKikp2ken8uza4z6g%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAPh7Qf%3D%2BwhtG%3D21SRVeE-XdEbMXoHK1%3D8tKikp2ken8uza4z6g%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/nVQyRc-AKDM/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAPmjWd0u3aX_oOu2v_0WcPAygDPWba%3DCBhe-_oRinS8BjEQpkQ%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAPmjWd0u3aX_oOu2v_0WcPAygDPWba%3DCBhe-_oRinS8BjEQpkQ%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPh7Qfn6wAv069-OR09nSQr1UZ9r5ofGXHGg1cguzDNfXkFLTA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Topic		Replies	Views
[Java] What happened to .cache(true) on filters in 2.0? Elasticsearch	3	536	July 5, 2017
Cache invalidating and recreation on TermsFilter values change Elasticsearch	1	346	July 6, 2017
Shard query cache and filters with _cache:false Elasticsearch	2	726	July 6, 2017
Where is boolFilter().cache and hasParentFilter in ES 2.x? Elasticsearch	2	562	July 5, 2017
Filter cache invalidation Elasticsearch	8	1593	July 6, 2017

Enabling filter cache

Related topics