Hi, I have a dynamic query built via java api that assembles a filtered
query depending on the parameter input. I have about a dozen filters
(mostly term filters) that may or may not be used, and had a couple
questions:
Is it ok to simply set the parent boolFilterBuilder cache setting to
true, or do I need to set cache=true for each filter?
Would it be a good practice to execute a dummy query with all the
filters to preemptively create the filter before it's released for actual
use?
Hi, I have a dynamic query built via java api that assembles a filtered
query depending on the parameter input. I have about a dozen filters
(mostly term filters) that may or may not be used, and had a couple
questions:
Is it ok to simply set the parent boolFilterBuilder cache setting to
true, or do I need to set cache=true for each filter?
Those do different things. One caches the combined results and one caches
each term. To be honest term filter are rarely worth caching because just
hitting lucene for them is so fast.
Would it be a good practice to execute a dummy query with all the
filters to preemptively create the filter before it's released for actual
use?
This is what warmers are for. They are applied to new segments to eagerly
load stuff including the filter cache. Elasticsearch's filer cache is per
segment so this is a good match. Its also why the filter cache doesn't have
to be invalidated - segments are write only and deletes are applied after
the results from the filter cache.
In terms of performance, we we talking nanoseconds saved by using term
filters, or possibly a few milliseconds? Given the performance requirements
for this query, even saving a few milliseconds is a lot. Also, it looks
like I should cache at the individual filter level, as they will be bundled
differently depending on the params. Thanks for the clarification!
On Wed, Apr 22, 2015 at 11:53 AM, Nikolas Everett nik9000@gmail.com wrote:
Hi, I have a dynamic query built via java api that assembles a filtered
query depending on the parameter input. I have about a dozen filters
(mostly term filters) that may or may not be used, and had a couple
questions:
Is it ok to simply set the parent boolFilterBuilder cache setting to
true, or do I need to set cache=true for each filter?
Those do different things. One caches the combined results and one caches
each term. To be honest term filter are rarely worth caching because just
hitting lucene for them is so fast.
Would it be a good practice to execute a dummy query with all the
filters to preemptively create the filter before it's released for actual
use?
This is what warmers are for. They are applied to new segments to eagerly
load stuff including the filter cache. Elasticsearch's filer cache is per
segment so this is a good match. Its also why the filter cache doesn't have
to be invalidated - segments are write only and deletes are applied after
the results from the filter cache.
In terms of performance, we we talking nanoseconds saved by using term
filters, or possibly a few milliseconds? Given the performance requirements
for this query, even saving a few milliseconds is a lot. Also, it looks
like I should cache at the individual filter level, as they will be bundled
differently depending on the params. Thanks for the clarification!
On Wed, Apr 22, 2015 at 11:53 AM, Nikolas Everett nik9000@gmail.com
wrote:
Hi, I have a dynamic query built via java api that assembles a filtered
query depending on the parameter input. I have about a dozen filters
(mostly term filters) that may or may not be used, and had a couple
questions:
Is it ok to simply set the parent boolFilterBuilder cache setting to
true, or do I need to set cache=true for each filter?
Those do different things. One caches the combined results and one caches
each term. To be honest term filter are rarely worth caching because just
hitting lucene for them is so fast.
Would it be a good practice to execute a dummy query with all the
filters to preemptively create the filter before it's released for actual
use?
This is what warmers are for. They are applied to new segments to eagerly
load stuff including the filter cache. Elasticsearch's filer cache is per
segment so this is a good match. Its also why the filter cache doesn't have
to be invalidated - segments are write only and deletes are applied after
the results from the filter cache.
I'm a bit confused, is terms filter slower because it has to iterate
through a list of bitsets whereas lucene already has access to the list of
matching documents via inverted index?
Also, if I set cache=true for each individual filter, does it allow me to
create any permutation of my bool filter (given a set of filters) and make
use of the cache? Or will this create a new filter cache?
On Wed, Apr 22, 2015 at 12:37 PM, Nikolas Everett nik9000@gmail.com wrote:
With term queries I imagine its nanoseconds to a net loss to use the
filter cache. You should really test it though because I'm not 100% sure.
There was talk of elassticsearch being more intelligent about which
filters it decides to cache but I don't know where that's gone.
In terms of performance, we we talking nanoseconds saved by using term
filters, or possibly a few milliseconds? Given the performance requirements
for this query, even saving a few milliseconds is a lot. Also, it looks
like I should cache at the individual filter level, as they will be bundled
differently depending on the params. Thanks for the clarification!
On Wed, Apr 22, 2015 at 11:53 AM, Nikolas Everett nik9000@gmail.com
wrote:
Hi, I have a dynamic query built via java api that assembles a filtered
query depending on the parameter input. I have about a dozen filters
(mostly term filters) that may or may not be used, and had a couple
questions:
Is it ok to simply set the parent boolFilterBuilder cache setting to
true, or do I need to set cache=true for each filter?
Those do different things. One caches the combined results and one
caches each term. To be honest term filter are rarely worth caching because
just hitting lucene for them is so fast.
Would it be a good practice to execute a dummy query with all the
filters to preemptively create the filter before it's released for actual
use?
This is what warmers are for. They are applied to new segments to
eagerly load stuff including the filter cache. Elasticsearch's filer cache
is per segment so this is a good match. Its also why the filter cache
doesn't have to be invalidated - segments are write only and deletes are
applied after the results from the filter cache.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.