How to monitor for filter cache churn?


(Tikitu de Jager) #1

Hi folks,

I'm optimising our queries based on the advice in Zachary Tong's
presentation:
https://speakerdeck.com/polyfractal/elasticsearch-query-optimization
So far just switching all our query elements to filters has given a 6x
speedup on a monster query (65Kchars of compact json), which is very
encouraging :slight_smile:

All our queries are auto-generated from our own query syntax, though, so if
we switch to filters it's gonna have to be pretty much across the board
(all terminals in the query AST, or all boolean nodes, or some similarly
blunt instrument). Which makes me worry about cache churn.

Actually I have two questions:

  1. Can I monitor the filter cache size and eviction rate somehow? (REST
    for preference, but jmx would be fine too.) I only seem to see
    documentation for the field data cache.

  2. Any advice for caching/not caching the intermediate boolean nodes in a
    complex query? In our case many of these intermediate nodes will recur in
    other queries, so my default feeling is to cache them, but that has to be
    balanced against the extra cache usage (and risk of churn). So I guess the
    question is, just how fast is the bitset bool filter (we frequently have
    ANDs and ORs with 10 to 20 children) compared to caching the node? Should I
    even be considering caching these, or is the bitset combination fast enough
    to make it a no-brainer?

Cheers,
Tikitu

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ad493810-dad7-4018-9d71-256df58eebc1%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Otis Gospodnetić) #2

Hi Tikitu,

Re 1. and filer cache size + eviction monitoring, here is an
example: https://apps.sematext.com/spm-reports/s/b5g0cSyGm0

Otis

Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/

On Monday, January 27, 2014 5:03:47 PM UTC-5, Tikitu de Jager wrote:

Hi folks,

I'm optimising our queries based on the advice in Zachary Tong's
presentation:
https://speakerdeck.com/polyfractal/elasticsearch-query-optimization
So far just switching all our query elements to filters has given a 6x
speedup on a monster query (65Kchars of compact json), which is very
encouraging :slight_smile:

All our queries are auto-generated from our own query syntax, though, so
if we switch to filters it's gonna have to be pretty much across the board
(all terminals in the query AST, or all boolean nodes, or some similarly
blunt instrument). Which makes me worry about cache churn.

Actually I have two questions:

  1. Can I monitor the filter cache size and eviction rate somehow? (REST
    for preference, but jmx would be fine too.) I only seem to see
    documentation for the field data cache.

  2. Any advice for caching/not caching the intermediate boolean nodes in a
    complex query? In our case many of these intermediate nodes will recur
    in other queries, so my default feeling is to cache them, but that has to
    be balanced against the extra cache usage (and risk of churn). So I guess
    the question is, just how fast is the bitset bool filter (we frequently
    have ANDs and ORs with 10 to 20 children) compared to caching the node?
    Should I even be considering caching these, or is the bitset combination
    fast enough to make it a no-brainer?

Cheers,
Tikitu

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/31df45d8-b8c0-4bc4-888c-9ea0800adc66%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Tikitu de Jager) #3

Aha, I find it under cluster stats, under the "indices" key:

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/cluster-stats.html
That's unexpected.

So next question: is there a way to get that information per-node?

Otis, is the cluster stats info what you're charting there? Or do you get
that from somewhere else? (The chart I see has some oddities, btw: count
and evictions seem to sit on 0, which is unexpected while the size metric
rises and falls.)

Cheers,
Tikitu

On Tuesday, 28 January 2014 16:41:21 UTC+13, Otis Gospodnetic wrote:

Hi Tikitu,

Re 1. and filer cache size + eviction monitoring, here is an example:
https://apps.sematext.com/spm-reports/s/b5g0cSyGm0

Otis

Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/

On Monday, January 27, 2014 5:03:47 PM UTC-5, Tikitu de Jager wrote:

Hi folks,

I'm optimising our queries based on the advice in Zachary Tong's
presentation:
https://speakerdeck.com/polyfractal/elasticsearch-query-optimization
So far just switching all our query elements to filters has given a 6x
speedup on a monster query (65Kchars of compact json), which is very
encouraging :slight_smile:

All our queries are auto-generated from our own query syntax, though, so
if we switch to filters it's gonna have to be pretty much across the board
(all terminals in the query AST, or all boolean nodes, or some similarly
blunt instrument). Which makes me worry about cache churn.

Actually I have two questions:

  1. Can I monitor the filter cache size and eviction rate somehow?
    (REST for preference, but jmx would be fine too.) I only seem to see
    documentation for the field data cache.

  2. Any advice for caching/not caching the intermediate boolean nodes in a
    complex query? In our case many of these intermediate nodes will recur
    in other queries, so my default feeling is to cache them, but that has to
    be balanced against the extra cache usage (and risk of churn). So I guess
    the question is, just how fast is the bitset bool filter (we frequently
    have ANDs and ORs with 10 to 20 children) compared to caching the node?
    Should I even be considering caching these, or is the bitset combination
    fast enough to make it a no-brainer?

Cheers,
Tikitu

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c6227760-89fa-43c1-b326-ae0d875be344%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Zachary Tong) #4

You can monitor filter cache from three different levels - index, node and
cluster. The output is similar for all three outputs, you'll see a size in
bytes and an eviction count.

Regarding your question about bitset combination speed and caching
intermediate booleans...it depends (heh). The question is less about the
speed of combining say 50 individual bitsets compared to 1 combined bitset.
The single bitset will obviously be faster, but the speed difference is
pretty negligible compared to other operations.

What you should be thinking about is the effect of a "cache miss". Let's
operate under the assumption that filter cache size is limited, and
evictions will occur in some fashion (otherwise we'd just keep everything
in memory and be happy).

If you have a boolean combination of 50 filters that is cached, you only
need to keep that single filter "hot" in the cache. If your usage pattern
keeps it cached, you will have very little churn. But if it happens that
there is a lull and the combo-filter falls out of cache, the next time it
is executed you'll need to re-evaluate all 50 "interior" filters to derive
the final bitset. Those interior filters could potentially touch a large
number of documents (and the associated disk access). A cache miss could
be relatively expensive (still fast, but relative to simple bitset lookups)

When filters are specified independently, there is a greater chance that
individual filters may be missing from the cache. Each execution of the
set of filters may require a few of the "interior" filters to be evaluated,
but since the filters are cached separately there is a good chance the
majority of them remain cached. So the computational cost
is amortized over time instead of being lumpy. There is also a better
chance that filters stay cached since they are reused in other parts of
your query, which keeps them "live" even if the total combination rarely
re-occurs.

It also gets complicated because filter caching is technically per-segment.
It is possible for the very same filter to be cached in one segment but
evicted on another segment. The LRU cache tries to evict old (unused)
filters, but weights towards smaller segments since they are cheaper to
recalculate. This also means that data indexing has an affect on filter
caching, since a constant ingestion of new documents equals segment merges,
which clears the caches for those newly created segment.

Some other assorted thoughts:

  • Remember that boolean caching will cache the result of the bool, not
    the filter's themselves. E.g. if you have a bool of 10 Terms, the final
    bitset is the set of documents that matches the filters, not the 10 Term
    filters themselves. Probably obvious, but wanted to make it clear

  • Eviction metrics are...meh at best. Like I mentioned above, evictions
    are per-segment, and weighted towards small segments. You can see high
    eviction rates without it actually equalling much churn (e.g. lots of newly
    created, small segments are evicting when they merge, but 95% of your data
    is remaining safely cached). You can even see a lot of churn but still get
    good performance, since big segments tend to keep their caches around and
    they account for most of your data.

  • I'd tune by setting a cache size, timing query latency across a wide
    variety of queries and sorta watching eviction metrics. Bump the filter
    cache size and repeat, see if latency gets better. Similar idea for caching

  • In general, I only reach for caching boolean combinations when I know
    it will get hit very often.

This turned into a really long message! Let me know if you have any
questions :slight_smile:
-Zach

On Monday, January 27, 2014 5:03:47 PM UTC-5, Tikitu de Jager wrote:

Hi folks,

I'm optimising our queries based on the advice in Zachary Tong's
presentation:
https://speakerdeck.com/polyfractal/elasticsearch-query-optimization
So far just switching all our query elements to filters has given a 6x
speedup on a monster query (65Kchars of compact json), which is very
encouraging :slight_smile:

All our queries are auto-generated from our own query syntax, though, so
if we switch to filters it's gonna have to be pretty much across the board
(all terminals in the query AST, or all boolean nodes, or some similarly
blunt instrument). Which makes me worry about cache churn.

Actually I have two questions:

  1. Can I monitor the filter cache size and eviction rate somehow? (REST
    for preference, but jmx would be fine too.) I only seem to see
    documentation for the field data cache.

  2. Any advice for caching/not caching the intermediate boolean nodes in a
    complex query? In our case many of these intermediate nodes will recur
    in other queries, so my default feeling is to cache them, but that has to
    be balanced against the extra cache usage (and risk of churn). So I guess
    the question is, just how fast is the bitset bool filter (we frequently
    have ANDs and ORs with 10 to 20 children) compared to caching the node?
    Should I even be considering caching these, or is the bitset combination
    fast enough to make it a no-brainer?

Cheers,
Tikitu

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/623481ed-bf05-47be-9d33-a0402e4bcfdb%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Zachary Tong) #5

Oh, final note that I forgot: if you cache the boolean combination of
filters, the individual leaf node filters will still be cached by default.
You'll have to explicitly disable caching for leaf nodes if you don't want
that behavior.

-Z

On Tuesday, January 28, 2014 5:58:21 AM UTC-5, Zachary Tong wrote:

You can monitor filter cache from three different levels - index, node and
cluster. The output is similar for all three outputs, you'll see a size in
bytes and an eviction count.

Regarding your question about bitset combination speed and caching
intermediate booleans...it depends (heh). The question is less about the
speed of combining say 50 individual bitsets compared to 1 combined bitset.
The single bitset will obviously be faster, but the speed difference is
pretty negligible compared to other operations.

What you should be thinking about is the effect of a "cache miss". Let's
operate under the assumption that filter cache size is limited, and
evictions will occur in some fashion (otherwise we'd just keep everything
in memory and be happy).

If you have a boolean combination of 50 filters that is cached, you only
need to keep that single filter "hot" in the cache. If your usage pattern
keeps it cached, you will have very little churn. But if it happens that
there is a lull and the combo-filter falls out of cache, the next time it
is executed you'll need to re-evaluate all 50 "interior" filters to derive
the final bitset. Those interior filters could potentially touch a large
number of documents (and the associated disk access). A cache miss could
be relatively expensive (still fast, but relative to simple bitset lookups)

When filters are specified independently, there is a greater chance that
individual filters may be missing from the cache. Each execution of the
set of filters may require a few of the "interior" filters to be evaluated,
but since the filters are cached separately there is a good chance the
majority of them remain cached. So the computational cost
is amortized over time instead of being lumpy. There is also a better
chance that filters stay cached since they are reused in other parts of
your query, which keeps them "live" even if the total combination rarely
re-occurs.

It also gets complicated because filter caching is technically
per-segment. It is possible for the very same filter to be cached in
one segment but evicted on another segment. The LRU cache tries to evict
old (unused) filters, but weights towards smaller segments since they are
cheaper to recalculate. This also means that data indexing has an affect
on filter caching, since a constant ingestion of new documents equals
segment merges, which clears the caches for those newly created segment.

Some other assorted thoughts:

  • Remember that boolean caching will cache the result of the bool,
    not the filter's themselves. E.g. if you have a bool of 10 Terms, the
    final bitset is the set of documents that matches the filters, not the 10
    Term filters themselves. Probably obvious, but wanted to make it clear

  • Eviction metrics are...meh at best. Like I mentioned above,
    evictions are per-segment, and weighted towards small segments. You can
    see high eviction rates without it actually equalling much churn (e.g. lots
    of newly created, small segments are evicting when they merge, but 95% of
    your data is remaining safely cached). You can even see a lot of churn but
    still get good performance, since big segments tend to keep their caches
    around and they account for most of your data.

  • I'd tune by setting a cache size, timing query latency across a wide
    variety of queries and sorta watching eviction metrics. Bump the filter
    cache size and repeat, see if latency gets better. Similar idea for caching

  • In general, I only reach for caching boolean combinations when I
    know it will get hit very often.

This turned into a really long message! Let me know if you have any
questions :slight_smile:
-Zach

On Monday, January 27, 2014 5:03:47 PM UTC-5, Tikitu de Jager wrote:

Hi folks,

I'm optimising our queries based on the advice in Zachary Tong's
presentation:
https://speakerdeck.com/polyfractal/elasticsearch-query-optimization
So far just switching all our query elements to filters has given a 6x
speedup on a monster query (65Kchars of compact json), which is very
encouraging :slight_smile:

All our queries are auto-generated from our own query syntax, though, so
if we switch to filters it's gonna have to be pretty much across the board
(all terminals in the query AST, or all boolean nodes, or some similarly
blunt instrument). Which makes me worry about cache churn.

Actually I have two questions:

  1. Can I monitor the filter cache size and eviction rate somehow?
    (REST for preference, but jmx would be fine too.) I only seem to see
    documentation for the field data cache.

  2. Any advice for caching/not caching the intermediate boolean nodes in a
    complex query? In our case many of these intermediate nodes will recur
    in other queries, so my default feeling is to cache them, but that has to
    be balanced against the extra cache usage (and risk of churn). So I guess
    the question is, just how fast is the bitset bool filter (we frequently
    have ANDs and ORs with 10 to 20 children) compared to caching the node?
    Should I even be considering caching these, or is the bitset combination
    fast enough to make it a no-brainer?

Cheers,
Tikitu

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/9228f54b-cc59-4589-ac27-d8c46a49c671%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Tikitu de Jager) #6

Zach, thanks very much for that. Of course "it depends" but you've given me
lots of good tips to keep in mind. (Especially the warning not to take
eviction metrics too seriously probably saved me hours of sweat and
worry.)

My stats calls behave a bit differently (0.90.9) so I list them here in
case anyone else runs into the same confusion:

Per-index:
curl -XGET "http://localhost:9200/<my_index>/_stats?filter_cache"
(filter_cache entry not shown by default; param ?all shows it as well)

Per-node:
curl -XGET "http://localhost:9200/_nodes/stats?all"
curl -XGET "http://localhost:9200/_nodes/stats?indices"
(without ?all or ?indices the indices.filter_cache entry isn't included; I
didn't find a way to specify only that entry)

Per-cluster:
curl -XGET "http://localhost:9200/_cluster/stats"
(included by default)

Thanks again for your help!
Tikitu

On Wednesday, 29 January 2014 00:06:05 UTC+13, Zachary Tong wrote:

Oh, final note that I forgot: if you cache the boolean combination of
filters, the individual leaf node filters will still be cached by default.
You'll have to explicitly disable caching for leaf nodes if you don't want
that behavior.

-Z

On Tuesday, January 28, 2014 5:58:21 AM UTC-5, Zachary Tong wrote:

You can monitor filter cache from three different levels - index, node
and cluster. The output is similar for all three outputs, you'll see a
size in bytes and an eviction count.

Regarding your question about bitset combination speed and caching
intermediate booleans...it depends (heh). The question is less about the
speed of combining say 50 individual bitsets compared to 1 combined bitset.
The single bitset will obviously be faster, but the speed difference is
pretty negligible compared to other operations.

What you should be thinking about is the effect of a "cache miss". Let's
operate under the assumption that filter cache size is limited, and
evictions will occur in some fashion (otherwise we'd just keep everything
in memory and be happy).

If you have a boolean combination of 50 filters that is cached, you only
need to keep that single filter "hot" in the cache. If your usage pattern
keeps it cached, you will have very little churn. But if it happens that
there is a lull and the combo-filter falls out of cache, the next time it
is executed you'll need to re-evaluate all 50 "interior" filters to derive
the final bitset. Those interior filters could potentially touch a large
number of documents (and the associated disk access). A cache miss could
be relatively expensive (still fast, but relative to simple bitset lookups)

When filters are specified independently, there is a greater chance that
individual filters may be missing from the cache. Each execution of the
set of filters may require a few of the "interior" filters to be evaluated,
but since the filters are cached separately there is a good chance the
majority of them remain cached. So the computational cost
is amortized over time instead of being lumpy. There is also a better
chance that filters stay cached since they are reused in other parts of
your query, which keeps them "live" even if the total combination rarely
re-occurs.

It also gets complicated because filter caching is technically
per-segment. It is possible for the very same filter to be cached in
one segment but evicted on another segment. The LRU cache tries to evict
old (unused) filters, but weights towards smaller segments since they are
cheaper to recalculate. This also means that data indexing has an affect
on filter caching, since a constant ingestion of new documents equals
segment merges, which clears the caches for those newly created segment.

Some other assorted thoughts:

  • Remember that boolean caching will cache the result of the bool,
    not the filter's themselves. E.g. if you have a bool of 10 Terms, the
    final bitset is the set of documents that matches the filters, not the 10
    Term filters themselves. Probably obvious, but wanted to make it clear

  • Eviction metrics are...meh at best. Like I mentioned above,
    evictions are per-segment, and weighted towards small segments. You can
    see high eviction rates without it actually equalling much churn (e.g. lots
    of newly created, small segments are evicting when they merge, but 95% of
    your data is remaining safely cached). You can even see a lot of churn but
    still get good performance, since big segments tend to keep their caches
    around and they account for most of your data.

  • I'd tune by setting a cache size, timing query latency across a
    wide variety of queries and sorta watching eviction metrics. Bump the
    filter cache size and repeat, see if latency gets better. Similar idea for
    caching

  • In general, I only reach for caching boolean combinations when I
    know it will get hit very often.

This turned into a really long message! Let me know if you have any
questions :slight_smile:
-Zach

On Monday, January 27, 2014 5:03:47 PM UTC-5, Tikitu de Jager wrote:

Hi folks,

I'm optimising our queries based on the advice in Zachary Tong's
presentation:
https://speakerdeck.com/polyfractal/elasticsearch-query-optimization
So far just switching all our query elements to filters has given a 6x
speedup on a monster query (65Kchars of compact json), which is very
encouraging :slight_smile:

All our queries are auto-generated from our own query syntax, though, so
if we switch to filters it's gonna have to be pretty much across the board
(all terminals in the query AST, or all boolean nodes, or some similarly
blunt instrument). Which makes me worry about cache churn.

Actually I have two questions:

  1. Can I monitor the filter cache size and eviction rate somehow?
    (REST for preference, but jmx would be fine too.) I only seem to see
    documentation for the field data cache.

  2. Any advice for caching/not caching the intermediate boolean nodes in
    a complex query? In our case many of these intermediate nodes will recur
    in other queries, so my default feeling is to cache them, but that has to
    be balanced against the extra cache usage (and risk of churn). So I guess
    the question is, just how fast is the bitset bool filter (we frequently
    have ANDs and ORs with 10 to 20 children) compared to caching the node?
    Should I even be considering caching these, or is the bitset combination
    fast enough to make it a no-brainer?

Cheers,
Tikitu

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/58a84197-f781-4641-b234-ea5b67ead8d2%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #7