How to monitor for filter cache churn?

Tikitu_de_Jager · January 27, 2014, 10:03pm

Hi folks,

I'm optimising our queries based on the advice in Zachary Tong's
presentation:
https://speakerdeck.com/polyfractal/elasticsearch-query-optimization
So far just switching all our query elements to filters has given a 6x
speedup on a monster query (65Kchars of compact json), which is very
encouraging

All our queries are auto-generated from our own query syntax, though, so if
we switch to filters it's gonna have to be pretty much across the board
(all terminals in the query AST, or all boolean nodes, or some similarly
blunt instrument). Which makes me worry about cache churn.

Actually I have two questions:

Can I monitor the filter cache size and eviction rate somehow? (REST
for preference, but jmx would be fine too.) I only seem to see
documentation for the field data cache.
Any advice for caching/not caching the intermediate boolean nodes in a
complex query? In our case many of these intermediate nodes will recur in
other queries, so my default feeling is to cache them, but that has to be
balanced against the extra cache usage (and risk of churn). So I guess the
question is, just how fast is the bitset bool filter (we frequently have
ANDs and ORs with 10 to 20 children) compared to caching the node? Should I
even be considering caching these, or is the bitset combination fast enough
to make it a no-brainer?

Cheers,
Tikitu

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ad493810-dad7-4018-9d71-256df58eebc1%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

otisg · January 28, 2014, 3:41am

Hi Tikitu,

Re 1. and filer cache size + eviction monitoring, here is an
example: https://apps.sematext.com/spm-reports/s/b5g0cSyGm0

Otis

Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/

On Monday, January 27, 2014 5:03:47 PM UTC-5, Tikitu de Jager wrote:

Hi folks,

I'm optimising our queries based on the advice in Zachary Tong's
presentation:
https://speakerdeck.com/polyfractal/elasticsearch-query-optimization
So far just switching all our query elements to filters has given a 6x
speedup on a monster query (65Kchars of compact json), which is very
encouraging

All our queries are auto-generated from our own query syntax, though, so
if we switch to filters it's gonna have to be pretty much across the board
(all terminals in the query AST, or all boolean nodes, or some similarly
blunt instrument). Which makes me worry about cache churn.

Actually I have two questions:

Can I monitor the filter cache size and eviction rate somehow? (REST
for preference, but jmx would be fine too.) I only seem to see
documentation for the field data cache.

Any advice for caching/not caching the intermediate boolean nodes in a
complex query? In our case many of these intermediate nodes will recur
in other queries, so my default feeling is to cache them, but that has to
be balanced against the extra cache usage (and risk of churn). So I guess
the question is, just how fast is the bitset bool filter (we frequently
have ANDs and ORs with 10 to 20 children) compared to caching the node?
Should I even be considering caching these, or is the bitset combination
fast enough to make it a no-brainer?

Cheers,
Tikitu

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/31df45d8-b8c0-4bc4-888c-9ea0800adc66%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Tikitu_de_Jager · January 28, 2014, 4:06am

Aha, I find it under cluster stats, under the "indices" key:

That's unexpected.

So next question: is there a way to get that information per-node?

Otis, is the cluster stats info what you're charting there? Or do you get
that from somewhere else? (The chart I see has some oddities, btw: count
and evictions seem to sit on 0, which is unexpected while the size metric
rises and falls.)

Cheers,
Tikitu

On Tuesday, 28 January 2014 16:41:21 UTC+13, Otis Gospodnetic wrote:

Hi Tikitu,

Re 1. and filer cache size + eviction monitoring, here is an example:
https://apps.sematext.com/spm-reports/s/b5g0cSyGm0

Otis

Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/

On Monday, January 27, 2014 5:03:47 PM UTC-5, Tikitu de Jager wrote:

Hi folks,

I'm optimising our queries based on the advice in Zachary Tong's
presentation:
https://speakerdeck.com/polyfractal/elasticsearch-query-optimization
So far just switching all our query elements to filters has given a 6x
speedup on a monster query (65Kchars of compact json), which is very
encouraging

All our queries are auto-generated from our own query syntax, though, so
if we switch to filters it's gonna have to be pretty much across the board
(all terminals in the query AST, or all boolean nodes, or some similarly
blunt instrument). Which makes me worry about cache churn.

Actually I have two questions:

Can I monitor the filter cache size and eviction rate somehow?
(REST for preference, but jmx would be fine too.) I only seem to see
documentation for the field data cache.

Any advice for caching/not caching the intermediate boolean nodes in a
complex query? In our case many of these intermediate nodes will recur
in other queries, so my default feeling is to cache them, but that has to
be balanced against the extra cache usage (and risk of churn). So I guess
the question is, just how fast is the bitset bool filter (we frequently
have ANDs and ORs with 10 to 20 children) compared to caching the node?
Should I even be considering caching these, or is the bitset combination
fast enough to make it a no-brainer?

Cheers,
Tikitu

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c6227760-89fa-43c1-b326-ae0d875be344%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

polyfractal · January 28, 2014, 10:58am

You can monitor filter cache from three different levels - index, node and
cluster. The output is similar for all three outputs, you'll see a size in
bytes and an eviction count.

Per-index:
curl -XGET "http://localhost:9200/<my_index>/_stats"
Per-node:
curl -XGET "http://localhost:9200/_nodes/stats"
Entire Cluster (this is actually pretty new, introduced in 0.90.8):
curl -XGET "http://localhost:9200/_cluster/stats"

Regarding your question about bitset combination speed and caching
intermediate booleans...it depends (heh). The question is less about the
speed of combining say 50 individual bitsets compared to 1 combined bitset.
The single bitset will obviously be faster, but the speed difference is
pretty negligible compared to other operations.

What you should be thinking about is the effect of a "cache miss". Let's
operate under the assumption that filter cache size is limited, and
evictions will occur in some fashion (otherwise we'd just keep everything
in memory and be happy).

If you have a boolean combination of 50 filters that is cached, you only
need to keep that single filter "hot" in the cache. If your usage pattern
keeps it cached, you will have very little churn. But if it happens that
there is a lull and the combo-filter falls out of cache, the next time it
is executed you'll need to re-evaluate all 50 "interior" filters to derive
the final bitset. Those interior filters could potentially touch a large
number of documents (and the associated disk access). A cache miss could
be relatively expensive (still fast, but relative to simple bitset lookups)

When filters are specified independently, there is a greater chance that
individual filters may be missing from the cache. Each execution of the
set of filters may require a few of the "interior" filters to be evaluated,
but since the filters are cached separately there is a good chance the
majority of them remain cached. So the computational cost
is amortized over time instead of being lumpy. There is also a better
chance that filters stay cached since they are reused in other parts of
your query, which keeps them "live" even if the total combination rarely
re-occurs.

It also gets complicated because filter caching is technically per-segment.
It is possible for the very same filter to be cached in one segment but
evicted on another segment. The LRU cache tries to evict old (unused)
filters, but weights towards smaller segments since they are cheaper to
recalculate. This also means that data indexing has an affect on filter
caching, since a constant ingestion of new documents equals segment merges,
which clears the caches for those newly created segment.

Some other assorted thoughts:

Remember that boolean caching will cache the result of the bool, not
the filter's themselves. E.g. if you have a bool of 10 Terms, the final
bitset is the set of documents that matches the filters, not the 10 Term
filters themselves. Probably obvious, but wanted to make it clear
Eviction metrics are...meh at best. Like I mentioned above, evictions
are per-segment, and weighted towards small segments. You can see high
eviction rates without it actually equalling much churn (e.g. lots of newly
created, small segments are evicting when they merge, but 95% of your data
is remaining safely cached). You can even see a lot of churn but still get
good performance, since big segments tend to keep their caches around and
they account for most of your data.
I'd tune by setting a cache size, timing query latency across a wide
variety of queries and sorta watching eviction metrics. Bump the filter
cache size and repeat, see if latency gets better. Similar idea for caching
In general, I only reach for caching boolean combinations when I know
it will get hit very often.

This turned into a really long message! Let me know if you have any
questions
-Zach

On Monday, January 27, 2014 5:03:47 PM UTC-5, Tikitu de Jager wrote:

Hi folks,

I'm optimising our queries based on the advice in Zachary Tong's
presentation:
https://speakerdeck.com/polyfractal/elasticsearch-query-optimization
So far just switching all our query elements to filters has given a 6x
speedup on a monster query (65Kchars of compact json), which is very
encouraging

All our queries are auto-generated from our own query syntax, though, so
if we switch to filters it's gonna have to be pretty much across the board
(all terminals in the query AST, or all boolean nodes, or some similarly
blunt instrument). Which makes me worry about cache churn.

Actually I have two questions:

Can I monitor the filter cache size and eviction rate somehow? (REST
for preference, but jmx would be fine too.) I only seem to see
documentation for the field data cache.

Any advice for caching/not caching the intermediate boolean nodes in a
complex query? In our case many of these intermediate nodes will recur
in other queries, so my default feeling is to cache them, but that has to
be balanced against the extra cache usage (and risk of churn). So I guess
the question is, just how fast is the bitset bool filter (we frequently
have ANDs and ORs with 10 to 20 children) compared to caching the node?
Should I even be considering caching these, or is the bitset combination
fast enough to make it a no-brainer?

Cheers,
Tikitu

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/623481ed-bf05-47be-9d33-a0402e4bcfdb%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

polyfractal · January 28, 2014, 11:06am

Oh, final note that I forgot: if you cache the boolean combination of
filters, the individual leaf node filters will still be cached by default.
You'll have to explicitly disable caching for leaf nodes if you don't want
that behavior.

-Z

On Tuesday, January 28, 2014 5:58:21 AM UTC-5, Zachary Tong wrote:

You can monitor filter cache from three different levels - index, node and
cluster. The output is similar for all three outputs, you'll see a size in
bytes and an eviction count.

Per-index:
curl -XGET "http://localhost:9200/<my_index>/_stats"

Per-node:
curl -XGET "http://localhost:9200/_nodes/stats"

Entire Cluster (this is actually pretty new, introduced in 0.90.8):
curl -XGET "http://localhost:9200/_cluster/stats"

Regarding your question about bitset combination speed and caching
intermediate booleans...it depends (heh). The question is less about the
speed of combining say 50 individual bitsets compared to 1 combined bitset.
The single bitset will obviously be faster, but the speed difference is
pretty negligible compared to other operations.

What you should be thinking about is the effect of a "cache miss". Let's
operate under the assumption that filter cache size is limited, and
evictions will occur in some fashion (otherwise we'd just keep everything
in memory and be happy).

If you have a boolean combination of 50 filters that is cached, you only
need to keep that single filter "hot" in the cache. If your usage pattern
keeps it cached, you will have very little churn. But if it happens that
there is a lull and the combo-filter falls out of cache, the next time it
is executed you'll need to re-evaluate all 50 "interior" filters to derive
the final bitset. Those interior filters could potentially touch a large
number of documents (and the associated disk access). A cache miss could
be relatively expensive (still fast, but relative to simple bitset lookups)

When filters are specified independently, there is a greater chance that
individual filters may be missing from the cache. Each execution of the
set of filters may require a few of the "interior" filters to be evaluated,
but since the filters are cached separately there is a good chance the
majority of them remain cached. So the computational cost
is amortized over time instead of being lumpy. There is also a better
chance that filters stay cached since they are reused in other parts of
your query, which keeps them "live" even if the total combination rarely
re-occurs.

It also gets complicated because filter caching is technically
per-segment. It is possible for the very same filter to be cached in
one segment but evicted on another segment. The LRU cache tries to evict
old (unused) filters, but weights towards smaller segments since they are
cheaper to recalculate. This also means that data indexing has an affect
on filter caching, since a constant ingestion of new documents equals
segment merges, which clears the caches for those newly created segment.

Some other assorted thoughts:

Remember that boolean caching will cache the result of the bool,
not the filter's themselves. E.g. if you have a bool of 10 Terms, the
final bitset is the set of documents that matches the filters, not the 10
Term filters themselves. Probably obvious, but wanted to make it clear

Eviction metrics are...meh at best. Like I mentioned above,
evictions are per-segment, and weighted towards small segments. You can
see high eviction rates without it actually equalling much churn (e.g. lots
of newly created, small segments are evicting when they merge, but 95% of
your data is remaining safely cached). You can even see a lot of churn but
still get good performance, since big segments tend to keep their caches
around and they account for most of your data.

I'd tune by setting a cache size, timing query latency across a wide
variety of queries and sorta watching eviction metrics. Bump the filter
cache size and repeat, see if latency gets better. Similar idea for caching

In general, I only reach for caching boolean combinations when I
know it will get hit very often.

This turned into a really long message! Let me know if you have any
questions
-Zach

On Monday, January 27, 2014 5:03:47 PM UTC-5, Tikitu de Jager wrote:

Hi folks,

I'm optimising our queries based on the advice in Zachary Tong's
presentation:
https://speakerdeck.com/polyfractal/elasticsearch-query-optimization
So far just switching all our query elements to filters has given a 6x
speedup on a monster query (65Kchars of compact json), which is very
encouraging

All our queries are auto-generated from our own query syntax, though, so
if we switch to filters it's gonna have to be pretty much across the board
(all terminals in the query AST, or all boolean nodes, or some similarly
blunt instrument). Which makes me worry about cache churn.

Actually I have two questions:

Can I monitor the filter cache size and eviction rate somehow?
(REST for preference, but jmx would be fine too.) I only seem to see
documentation for the field data cache.

Any advice for caching/not caching the intermediate boolean nodes in a
complex query? In our case many of these intermediate nodes will recur
in other queries, so my default feeling is to cache them, but that has to
be balanced against the extra cache usage (and risk of churn). So I guess
the question is, just how fast is the bitset bool filter (we frequently
have ANDs and ORs with 10 to 20 children) compared to caching the node?
Should I even be considering caching these, or is the bitset combination
fast enough to make it a no-brainer?

Cheers,
Tikitu

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/9228f54b-cc59-4589-ac27-d8c46a49c671%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Tikitu_de_Jager · January 28, 2014, 8:22pm

Zach, thanks very much for that. Of course "it depends" but you've given me
lots of good tips to keep in mind. (Especially the warning not to take
eviction metrics too seriously probably saved me hours of sweat and
worry.)

My stats calls behave a bit differently (0.90.9) so I list them here in
case anyone else runs into the same confusion:

Per-index:
curl -XGET "http://localhost:9200/<my_index>/_stats?filter_cache"
(filter_cache entry not shown by default; param ?all shows it as well)

Per-node:
curl -XGET "http://localhost:9200/_nodes/stats?all"
curl -XGET "http://localhost:9200/_nodes/stats?indices"
(without ?all or ?indices the indices.filter_cache entry isn't included; I
didn't find a way to specify only that entry)

Per-cluster:
curl -XGET "http://localhost:9200/_cluster/stats"
(included by default)

Thanks again for your help!
Tikitu

On Wednesday, 29 January 2014 00:06:05 UTC+13, Zachary Tong wrote:

Oh, final note that I forgot: if you cache the boolean combination of
filters, the individual leaf node filters will still be cached by default.
You'll have to explicitly disable caching for leaf nodes if you don't want
that behavior.

-Z

On Tuesday, January 28, 2014 5:58:21 AM UTC-5, Zachary Tong wrote:

You can monitor filter cache from three different levels - index, node
and cluster. The output is similar for all three outputs, you'll see a
size in bytes and an eviction count.

Per-index:
curl -XGET "http://localhost:9200/<my_index>/_stats"

Per-node:
curl -XGET "http://localhost:9200/_nodes/stats"

Entire Cluster (this is actually pretty new, introduced in 0.90.8):
curl -XGET "http://localhost:9200/_cluster/stats"

Regarding your question about bitset combination speed and caching
intermediate booleans...it depends (heh). The question is less about the
speed of combining say 50 individual bitsets compared to 1 combined bitset.
The single bitset will obviously be faster, but the speed difference is
pretty negligible compared to other operations.

What you should be thinking about is the effect of a "cache miss". Let's
operate under the assumption that filter cache size is limited, and
evictions will occur in some fashion (otherwise we'd just keep everything
in memory and be happy).

If you have a boolean combination of 50 filters that is cached, you only
need to keep that single filter "hot" in the cache. If your usage pattern
keeps it cached, you will have very little churn. But if it happens that
there is a lull and the combo-filter falls out of cache, the next time it
is executed you'll need to re-evaluate all 50 "interior" filters to derive
the final bitset. Those interior filters could potentially touch a large
number of documents (and the associated disk access). A cache miss could
be relatively expensive (still fast, but relative to simple bitset lookups)

When filters are specified independently, there is a greater chance that
individual filters may be missing from the cache. Each execution of the
set of filters may require a few of the "interior" filters to be evaluated,
but since the filters are cached separately there is a good chance the
majority of them remain cached. So the computational cost
is amortized over time instead of being lumpy. There is also a better
chance that filters stay cached since they are reused in other parts of
your query, which keeps them "live" even if the total combination rarely
re-occurs.

It also gets complicated because filter caching is technically
per-segment. It is possible for the very same filter to be cached in
one segment but evicted on another segment. The LRU cache tries to evict
old (unused) filters, but weights towards smaller segments since they are
cheaper to recalculate. This also means that data indexing has an affect
on filter caching, since a constant ingestion of new documents equals
segment merges, which clears the caches for those newly created segment.

Some other assorted thoughts:

Remember that boolean caching will cache the result of the bool,
not the filter's themselves. E.g. if you have a bool of 10 Terms, the
final bitset is the set of documents that matches the filters, not the 10
Term filters themselves. Probably obvious, but wanted to make it clear

Eviction metrics are...meh at best. Like I mentioned above,
evictions are per-segment, and weighted towards small segments. You can
see high eviction rates without it actually equalling much churn (e.g. lots
of newly created, small segments are evicting when they merge, but 95% of
your data is remaining safely cached). You can even see a lot of churn but
still get good performance, since big segments tend to keep their caches
around and they account for most of your data.

I'd tune by setting a cache size, timing query latency across a
wide variety of queries and sorta watching eviction metrics. Bump the
filter cache size and repeat, see if latency gets better. Similar idea for
caching

In general, I only reach for caching boolean combinations when I
know it will get hit very often.

This turned into a really long message! Let me know if you have any
questions
-Zach

On Monday, January 27, 2014 5:03:47 PM UTC-5, Tikitu de Jager wrote:

Hi folks,

I'm optimising our queries based on the advice in Zachary Tong's
presentation:
https://speakerdeck.com/polyfractal/elasticsearch-query-optimization
So far just switching all our query elements to filters has given a 6x
speedup on a monster query (65Kchars of compact json), which is very
encouraging

All our queries are auto-generated from our own query syntax, though, so
if we switch to filters it's gonna have to be pretty much across the board
(all terminals in the query AST, or all boolean nodes, or some similarly
blunt instrument). Which makes me worry about cache churn.

Actually I have two questions:

Can I monitor the filter cache size and eviction rate somehow?
(REST for preference, but jmx would be fine too.) I only seem to see
documentation for the field data cache.

Any advice for caching/not caching the intermediate boolean nodes in
a complex query? In our case many of these intermediate nodes will recur
in other queries, so my default feeling is to cache them, but that has to
be balanced against the extra cache usage (and risk of churn). So I guess
the question is, just how fast is the bitset bool filter (we frequently
have ANDs and ORs with 10 to 20 children) compared to caching the node?
Should I even be considering caching these, or is the bitset combination
fast enough to make it a no-brainer?

Cheers,
Tikitu

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/58a84197-f781-4641-b234-ea5b67ead8d2%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Topic		Replies	Views
Optimizing filter bitsets Elasticsearch	2	783	July 6, 2017
Better control over the request cache Elasticsearch	4	1393	July 5, 2017
Filter bitsets Elasticsearch	5	2142	July 6, 2017
Can I print out bitset cache sizes by filter (or what's eating my heap)? Elasticsearch	6	1350	July 5, 2017
Tuning my filter cache query Elasticsearch	1	382	July 6, 2017

How to monitor for filter cache churn?

Otis

Otis

Related topics