Unnecessary Cache Eviction Explained

Hi,

I've recently posted a question regarding mysterious field data cache
eviction[1]: ES is evicting field data cache entries even though it's
nowhere near the limit I've set.

In an unrelated post, someone found the root cause of this problem[2]: the
maximum cache size specified is actually split evenly between Guava's cache
partitions. ES configures Guava to use 16 partitions[3], which means that
any given field data inserted will actually have a maximum size of
indices.fielddata.cache.size / 16

In my case, I configured the cache size to be 10GB, but saw eviction at
very low cache usage (1.5GB in some cases). This is because at least one of
the cache partitions hit its maximum size of 625MB.

Obviously, the short-term solution is to increase the field data cache
size, but this will require that we overcommit by quite a bit in order to
have partitions with a sensible size for our most frequent queries.

Until Guava provides a way to have a global maximum size instead of a
per-partition size (as mentioned by Craig in his post), it would be nice to
have a handle on the number of partitions created for this cache. If I set
this to 2, for example, I'm still allowing 2 threads to write to this cache
concurrently without having to overcommit my global field data cache size
(at least not by much).

Anyone have another idea about how to deal with this?

Cheers,
Philippe
[1] https://groups.google.com/d/msg/elasticsearch/54XzFO44yJI/sd7Fm-WcrPcJ
[2] https://groups.google.com/d/msg/elasticsearch/42qrpYRJvsU/jhl3UZZG5sQJ
[3] https://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/indices/fielddata/cache/IndicesFieldDataCache.java#L77

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/87d08d3a-b7e9-4d7a-b7c5-238cfa5e355f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Hi Philippe,

Over the weekend I coded up a simple change to the Guava size-based
eviction algorithm to fix this. With my proposed change there are no API
changes and it works as a drop in replacement in ES. As you probably know
ES renames and compiles in the Guava libraries so actually deploying a new
build of Guava requires rebuilding ES.

The approach I took was to allow segments to grow larger than
maxSegmentWeight such that the total cache size remains below the overall
maxWeight. Then, if eviction within one segment doesn't reduce the cache
weight to below maxWeight, I find the largest segment and evict from
there. I use tryLock() so that if another thread is already using that
segment, eviction will happen as new values are loaded there. Like I said,
simple.

Perhaps we can work together to review my change, make improvements on it,
and get it submitted?

Craig.

On Monday, September 22, 2014 7:30:44 AM UTC-7, Philippe Laflamme wrote:

Hi,

I've recently posted a question regarding mysterious field data cache
eviction[1]: ES is evicting field data cache entries even though it's
nowhere near the limit I've set.

In an unrelated post, someone found the root cause of this problem[2]: the
maximum cache size specified is actually split evenly between Guava's cache
partitions. ES configures Guava to use 16 partitions[3], which means that
any given field data inserted will actually have a maximum size of
indices.fielddata.cache.size / 16

In my case, I configured the cache size to be 10GB, but saw eviction at
very low cache usage (1.5GB in some cases). This is because at least one of
the cache partitions hit its maximum size of 625MB.

Obviously, the short-term solution is to increase the field data cache
size, but this will require that we overcommit by quite a bit in order to
have partitions with a sensible size for our most frequent queries.

Until Guava provides a way to have a global maximum size instead of a
per-partition size (as mentioned by Craig in his post), it would be nice to
have a handle on the number of partitions created for this cache. If I set
this to 2, for example, I'm still allowing 2 threads to write to this cache
concurrently without having to overcommit my global field data cache size
(at least not by much).

Anyone have another idea about how to deal with this?

Cheers,
Philippe
[1] https://groups.google.com/d/msg/elasticsearch/54XzFO44yJI/sd7Fm-WcrPcJ
[2] https://groups.google.com/d/msg/elasticsearch/42qrpYRJvsU/jhl3UZZG5sQJ
[3]
https://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/indices/fielddata/cache/IndicesFieldDataCache.java#L77

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/56e71573-7b1a-47da-ae07-d8775804f104%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

It seems I had the same
problem: https://groups.google.com/forum/#!topic/elasticsearch/2wJG5L9A8cs

On Monday, September 22, 2014 1:02:29 PM UTC-3, Craig Wittenberg wrote:

Hi Philippe,

Over the weekend I coded up a simple change to the Guava size-based
eviction algorithm to fix this. With my proposed change there are no API
changes and it works as a drop in replacement in ES. As you probably know
ES renames and compiles in the Guava libraries so actually deploying a new
build of Guava requires rebuilding ES.

The approach I took was to allow segments to grow larger than
maxSegmentWeight such that the total cache size remains below the overall
maxWeight. Then, if eviction within one segment doesn't reduce the cache
weight to below maxWeight, I find the largest segment and evict from
there. I use tryLock() so that if another thread is already using that
segment, eviction will happen as new values are loaded there. Like I said,
simple.

Perhaps we can work together to review my change, make improvements on it,
and get it submitted?

Craig.

On Monday, September 22, 2014 7:30:44 AM UTC-7, Philippe Laflamme wrote:

Hi,

I've recently posted a question regarding mysterious field data cache
eviction[1]: ES is evicting field data cache entries even though it's
nowhere near the limit I've set.

In an unrelated post, someone found the root cause of this problem[2]:
the maximum cache size specified is actually split evenly between Guava's
cache partitions. ES configures Guava to use 16 partitions[3], which means
that any given field data inserted will actually have a maximum size of
indices.fielddata.cache.size / 16

In my case, I configured the cache size to be 10GB, but saw eviction at
very low cache usage (1.5GB in some cases). This is because at least one of
the cache partitions hit its maximum size of 625MB.

Obviously, the short-term solution is to increase the field data cache
size, but this will require that we overcommit by quite a bit in order to
have partitions with a sensible size for our most frequent queries.

Until Guava provides a way to have a global maximum size instead of a
per-partition size (as mentioned by Craig in his post), it would be nice to
have a handle on the number of partitions created for this cache. If I set
this to 2, for example, I'm still allowing 2 threads to write to this cache
concurrently without having to overcommit my global field data cache size
(at least not by much).

Anyone have another idea about how to deal with this?

Cheers,
Philippe
[1]
https://groups.google.com/d/msg/elasticsearch/54XzFO44yJI/sd7Fm-WcrPcJ
[2]
https://groups.google.com/d/msg/elasticsearch/42qrpYRJvsU/jhl3UZZG5sQJ
[3]
https://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/indices/fielddata/cache/IndicesFieldDataCache.java#L77

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/07cfb705-1f13-40d5-b7b2-c5b84e328ddb%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

That sounds great! I'd be happy to take a look at your change and possibly
do some testing locally. Is this hosted somewhere?

Cheers,
Philippe

On Mon, Sep 22, 2014 at 12:02 PM, Craig Wittenberg craigwi@microsoft.com
wrote:

Hi Philippe,

Over the weekend I coded up a simple change to the Guava size-based
eviction algorithm to fix this. With my proposed change there are no API
changes and it works as a drop in replacement in ES. As you probably know
ES renames and compiles in the Guava libraries so actually deploying a new
build of Guava requires rebuilding ES.

The approach I took was to allow segments to grow larger than
maxSegmentWeight such that the total cache size remains below the overall
maxWeight. Then, if eviction within one segment doesn't reduce the cache
weight to below maxWeight, I find the largest segment and evict from
there. I use tryLock() so that if another thread is already using that
segment, eviction will happen as new values are loaded there. Like I said,
simple.

Perhaps we can work together to review my change, make improvements on it,
and get it submitted?

Craig.

On Monday, September 22, 2014 7:30:44 AM UTC-7, Philippe Laflamme wrote:

Hi,

I've recently posted a question regarding mysterious field data cache
eviction[1]: ES is evicting field data cache entries even though it's
nowhere near the limit I've set.

In an unrelated post, someone found the root cause of this problem[2]:
the maximum cache size specified is actually split evenly between Guava's
cache partitions. ES configures Guava to use 16 partitions[3], which means
that any given field data inserted will actually have a maximum size of
indices.fielddata.cache.size / 16

In my case, I configured the cache size to be 10GB, but saw eviction at
very low cache usage (1.5GB in some cases). This is because at least one of
the cache partitions hit its maximum size of 625MB.

Obviously, the short-term solution is to increase the field data cache
size, but this will require that we overcommit by quite a bit in order to
have partitions with a sensible size for our most frequent queries.

Until Guava provides a way to have a global maximum size instead of a
per-partition size (as mentioned by Craig in his post), it would be nice to
have a handle on the number of partitions created for this cache. If I set
this to 2, for example, I'm still allowing 2 threads to write to this cache
concurrently without having to overcommit my global field data cache size
(at least not by much).

Anyone have another idea about how to deal with this?

Cheers,
Philippe
[1] https://groups.google.com/d/msg/elasticsearch/54XzFO44yJI/
sd7Fm-WcrPcJ
[2] https://groups.google.com/d/msg/elasticsearch/42qrpYRJvsU/
jhl3UZZG5sQJ
[3] https://github.com/elasticsearch/elasticsearch/
blob/master/src/main/java/org/elasticsearch/indices/fielddata/cache/
IndicesFieldDataCache.java#L77

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/56e71573-7b1a-47da-ae07-d8775804f104%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/56e71573-7b1a-47da-ae07-d8775804f104%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKixPJJkTn_FLK5yoOGc%2B1FsRVQrvav_U7VDpsxXVzjVgKkonw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Yes, that sounds exactly like the same problem.

On Mon, Sep 22, 2014 at 1:24 PM, Felipe Hummel felipehummel@gmail.com
wrote:

It seems I had the same problem:
https://groups.google.com/forum/#!topic/elasticsearch/2wJG5L9A8cs

On Monday, September 22, 2014 1:02:29 PM UTC-3, Craig Wittenberg wrote:

Hi Philippe,

Over the weekend I coded up a simple change to the Guava size-based
eviction algorithm to fix this. With my proposed change there are no API
changes and it works as a drop in replacement in ES. As you probably know
ES renames and compiles in the Guava libraries so actually deploying a new
build of Guava requires rebuilding ES.

The approach I took was to allow segments to grow larger than
maxSegmentWeight such that the total cache size remains below the overall
maxWeight. Then, if eviction within one segment doesn't reduce the cache
weight to below maxWeight, I find the largest segment and evict from
there. I use tryLock() so that if another thread is already using that
segment, eviction will happen as new values are loaded there. Like I said,
simple.

Perhaps we can work together to review my change, make improvements on
it, and get it submitted?

Craig.

On Monday, September 22, 2014 7:30:44 AM UTC-7, Philippe Laflamme wrote:

Hi,

I've recently posted a question regarding mysterious field data cache
eviction[1]: ES is evicting field data cache entries even though it's
nowhere near the limit I've set.

In an unrelated post, someone found the root cause of this problem[2]:
the maximum cache size specified is actually split evenly between Guava's
cache partitions. ES configures Guava to use 16 partitions[3], which means
that any given field data inserted will actually have a maximum size of
indices.fielddata.cache.size / 16

In my case, I configured the cache size to be 10GB, but saw eviction at
very low cache usage (1.5GB in some cases). This is because at least one of
the cache partitions hit its maximum size of 625MB.

Obviously, the short-term solution is to increase the field data cache
size, but this will require that we overcommit by quite a bit in order to
have partitions with a sensible size for our most frequent queries.

Until Guava provides a way to have a global maximum size instead of a
per-partition size (as mentioned by Craig in his post), it would be nice to
have a handle on the number of partitions created for this cache. If I set
this to 2, for example, I'm still allowing 2 threads to write to this cache
concurrently without having to overcommit my global field data cache size
(at least not by much).

Anyone have another idea about how to deal with this?

Cheers,
Philippe
[1] https://groups.google.com/d/msg/elasticsearch/54XzFO44yJI/
sd7Fm-WcrPcJ
[2] https://groups.google.com/d/msg/elasticsearch/42qrpYRJvsU/
jhl3UZZG5sQJ
[3] https://github.com/elasticsearch/elasticsearch/
blob/master/src/main/java/org/elasticsearch/indices/fielddata/cache/
IndicesFieldDataCache.java#L77

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/07cfb705-1f13-40d5-b7b2-c5b84e328ddb%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/07cfb705-1f13-40d5-b7b2-c5b84e328ddb%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKixPJJ2aVS8mXrUfk%3D7k%3Do7wNahPm60cwFPCdqEZt3DcWqkHw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

You can find my proposed changes at https://code.google.com/r/craigwi-guava/.
Comments welcome.

I'm having trouble compiling the tests and so haven't run them yet.

Craig.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/0f0bca16-7c75-4f25-91ac-dc512f63e12e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Hi,

It sounds like every single ES deployment out there suffers from this, or
am I missing something? Is there an ES issue where this could be tracked
(even if the problem in in Guava)?

Thanks,
Otis

Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/

On Monday, September 22, 2014 5:27:56 PM UTC-4, Craig Wittenberg wrote:

You can find my proposed changes at
https://code.google.com/r/craigwi-guava/. Comments welcome.

I'm having trouble compiling the tests and so haven't run them yet.

Craig.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b941d15b-1904-40b1-883b-063995e7bbfb%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

I opened issue #7836:
https://github.com/elasticsearch/elasticsearch/issues/7836.

On Monday, September 22, 2014 9:27:47 PM UTC-7, Otis Gospodnetic wrote:

Hi,

It sounds like every single ES deployment out there suffers from this, or
am I missing something? Is there an ES issue where this could be tracked
(even if the problem in in Guava)?

Thanks,
Otis

Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/

On Monday, September 22, 2014 5:27:56 PM UTC-4, Craig Wittenberg wrote:

You can find my proposed changes at
https://code.google.com/r/craigwi-guava/. Comments welcome.

I'm having trouble compiling the tests and so haven't run them yet.

Craig.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/69532927-3b07-41cd-9734-3c50fe10ced4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Otis, from what I understand, the default size for the cache is unbounded,
so cache eviction should not occur due to inconsistent range checks in the
default case.

--
Ivan

On Mon, Sep 22, 2014 at 9:27 PM, Otis Gospodnetic <
otis.gospodnetic@gmail.com> wrote:

Hi,

It sounds like every single ES deployment out there suffers from this, or
am I missing something? Is there an ES issue where this could be tracked
(even if the problem in in Guava)?

Thanks,
Otis

Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/

On Monday, September 22, 2014 5:27:56 PM UTC-4, Craig Wittenberg wrote:

You can find my proposed changes at https://code.google.com/r/
craigwi-guava/. Comments welcome.

I'm having trouble compiling the tests and so haven't run them yet.

Craig.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/b941d15b-1904-40b1-883b-063995e7bbfb%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/b941d15b-1904-40b1-883b-063995e7bbfb%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQCbmnsUXY5HMkqzMBw_fC20EnQwe9qgftROvHsN%2B881GA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Indeed, only instances with a value (greater than 0) specified for
indices.fielddata.cache.size are affected. This is what triggers the use of
Guava's eviction-based-on-size feature[1]

Philippe
[1]
https://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/indices/fielddata/cache/IndicesFieldDataCache.java#L74

On Tue, Sep 23, 2014 at 11:00 AM, Ivan Brusic ivan@brusic.com wrote:

Otis, from what I understand, the default size for the cache is unbounded,
so cache eviction should not occur due to inconsistent range checks in the
default case.

--
Ivan

On Mon, Sep 22, 2014 at 9:27 PM, Otis Gospodnetic <
otis.gospodnetic@gmail.com> wrote:

Hi,

It sounds like every single ES deployment out there suffers from this, or
am I missing something? Is there an ES issue where this could be tracked
(even if the problem in in Guava)?

Thanks,
Otis

Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/

On Monday, September 22, 2014 5:27:56 PM UTC-4, Craig Wittenberg wrote:

You can find my proposed changes at https://code.google.com/r/
craigwi-guava/. Comments welcome.

I'm having trouble compiling the tests and so haven't run them yet.

Craig.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/b941d15b-1904-40b1-883b-063995e7bbfb%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/b941d15b-1904-40b1-883b-063995e7bbfb%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQCbmnsUXY5HMkqzMBw_fC20EnQwe9qgftROvHsN%2B881GA%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQCbmnsUXY5HMkqzMBw_fC20EnQwe9qgftROvHsN%2B881GA%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKixPJ%2BAiriZJKE_M%2BJ2NqWU3riyPcfAonfCgCLOzX_mCK%2BJsw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.