Field Data Cache Size and Eviction

Philippe_Laflamme · September 12, 2014, 1:21pm

Hi,

I have a cluster with nodes configured with a 18G heap. We've noticed a
degradation in performance recently after increasing the volume of data
we're indexing.

I think the issue is due to the field data cache doing eviction. Some nodes
are doing lots of them, some aren't doing any. This is explained by our
routing strategy which results in non-uniform document distribution. Maybe
we can improve this eventually, but in the meantime, I'm trying to
understand why the nodes are evicting cached data.

The metrics show that the field data cache is only ~1.5GB in size, yet we
have this in our elasticsearch.yml:

indices.fielddata.cache.size: 10gb

Why would a node evict cache entries when it should still have plenty of
room to store more? Are we missing another setting? Is there a way to tell
what the actual fielddata cache size is at runtime (maybe it did not pickup
the configuration setting for some reason)?

Thanks,
Philippe

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/e619f974-1632-4694-a0f9-40c32100c504%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Philippe_Laflamme · September 12, 2014, 1:33pm

Forgot to mention that we're using ES 1.1.1

On Friday, September 12, 2014 9:21:23 AM UTC-4, Philippe Laflamme wrote:

Hi,

I have a cluster with nodes configured with a 18G heap. We've noticed a
degradation in performance recently after increasing the volume of data
we're indexing.

I think the issue is due to the field data cache doing eviction. Some
nodes are doing lots of them, some aren't doing any. This is explained by
our routing strategy which results in non-uniform document distribution.
Maybe we can improve this eventually, but in the meantime, I'm trying to
understand why the nodes are evicting cached data.

The metrics show that the field data cache is only ~1.5GB in size, yet we
have this in our elasticsearch.yml:

indices.fielddata.cache.size: 10gb

Why would a node evict cache entries when it should still have plenty of
room to store more? Are we missing another setting? Is there a way to tell
what the actual fielddata cache size is at runtime (maybe it did not pickup
the configuration setting for some reason)?

Thanks,
Philippe

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/512be87e-561a-4031-a465-d256ad400bbb%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Philippe_Laflamme · September 16, 2014, 2:25pm

Sorry for bumping this, but I'm a little stumped here.

We have some nodes that are evicting fielddata cache entries for seemingly
no reason:

we've set indices.fielddata.cache.size to 10gb
the metrics from the node stats endpoint show that the
indices.fielddata.memory_size_in_bytes never exceeded 3.6GB on any node.
the rate of eviction is normally 0, but goes up above that eventhough
the fielddata cache size is nowhere near 10GB

Attached is a plot of the max(indices.fielddata.memory_size_in_bytes) (red
line) and sum(indices.fielddata.evictions) (green line) across all nodes in
the cluster. Note that we create a fresh new index every day that replaces
an older one (that explains the change in profile around midnight).

As you can see, the size (on any given node) never exceeds 3.6GB, yet even
at a lower value (around 2.2GB), some nodes start evicting entries from the
cache. Also, starting around Tue 8AM, the max(field cache size) becomes
erratic and jumps up and down.

I can't explain this behaviour, especially since we've been operating for a
while at this volume and rate of documents. This was not happening before.
Though it's possible that we're getting a higher volume of data, it doesn't
look substantially different from the past.

Under what circumstances will an ES node evict entries from it's field data
cache? We're also deleting documents from the index, can this have an
impact? What other things should I be looking it to find a correlation (GC
time does not seem to be correlated)?

Thanks,
Philippe

On Friday, September 12, 2014 9:33:16 AM UTC-4, Philippe Laflamme wrote:

Forgot to mention that we're using ES 1.1.1

On Friday, September 12, 2014 9:21:23 AM UTC-4, Philippe Laflamme wrote:

Hi,

I have a cluster with nodes configured with a 18G heap. We've noticed a
degradation in performance recently after increasing the volume of data
we're indexing.

I think the issue is due to the field data cache doing eviction. Some
nodes are doing lots of them, some aren't doing any. This is explained by
our routing strategy which results in non-uniform document distribution.
Maybe we can improve this eventually, but in the meantime, I'm trying to
understand why the nodes are evicting cached data.

The metrics show that the field data cache is only ~1.5GB in size, yet we
have this in our elasticsearch.yml:

indices.fielddata.cache.size: 10gb

Why would a node evict cache entries when it should still have plenty of
room to store more? Are we missing another setting? Is there a way to tell
what the actual fielddata cache size is at runtime (maybe it did not pickup
the configuration setting for some reason)?

Thanks,
Philippe

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/eafa1b0a-dbd6-4127-94d5-3733a3067bc7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Jason_Wee · April 23, 2015, 8:10am

A bit late from the OP posted this, not sure if it is still relevant but
anyway...

Under what circumstances will an ES node evict entries from it's field
data cache? We're also deleting documents from the index, can this have an
impact? What other things should I be looking it to find a correlation (GC
time does not seem to be correlated)?

The cache implements an LRU eviction policy: when a cache becomes full,
the least recently used data is evicted to make way for new data.
http://www.elastic.co/guide/en/elasticsearch/reference/current/index-modules-cache.html

more information
here Monitoring Individual Nodes | Elasticsearch: The Definitive Guide [2.x] | Elastic

It's puzzling in your case when you set to 10GB for cache size but per node
usage only 3.6GB . Have you use the other api to check the cache if it is
also the case?
http://www.elastic.co/guide/en/elasticsearch/reference/current/index-modules-shard-query-cache.html#_monitoring_cache_usage

There are also a few additional links which might give you hints.

github.com/elastic/elasticsearch

Huge GC load with populated field cache

opened 07:15AM - 06 Sep 13 UTC

closed 10:51AM - 06 Sep 13 UTC

bobrik

This is followup for [my email](https://groups.google.com/forum/#!searchin/elast…icsearch/CPU$20consumption$20after$20long$20period$20of$20time/elasticsearch/_ihDrmVExLA/HaOgH3SE2RsJ). Here you may see what happens if you fill the whole heap with field cache (big query with faceting for 50gb of data). Indexing happens every 10 minutes and with full cache it becomes very painful because of GC. ![cache full](http://puu.sh/4kios.png) And this is what happens after `/_cache/clear`: ![cache clear](http://puu.sh/4kiri.png) We run elasticsearch 0.90.2 like that: `/usr/lib/jvm/oracle-jdk-bin-1.7/bin/java -Xms5g -Xmx5g -Xss256k -Djava.awt.headless=true -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError -Delasticsearch -Des.foreground=yes -Des.path.home=/opt/elasticsearch -cp :/opt/elasticsearch/lib/elasticsearch-0.90.2.jar:/opt/elasticsearch/lib/*:/opt/elasticsearch/lib/sigar/* -Des.config=/etc/elasticsearch/elasticsearch.yml org.elasticsearch.bootstrap.ElasticSearch` Settings in yml file only affect data location and network settings. Is this expected behaviour? Elasticsearch becomes very slow even for simple "get doc by id" (like 10-20 seconds instead of 10-500ms).

Hope it helps.

Jason

On Tuesday, September 16, 2014 at 10:25:08 PM UTC+8, Philippe Laflamme
wrote:

Sorry for bumping this, but I'm a little stumped here.

We have some nodes that are evicting fielddata cache entries for seemingly
no reason:

we've set indices.fielddata.cache.size to 10gb

the metrics from the node stats endpoint show that the
indices.fielddata.memory_size_in_bytes never exceeded 3.6GB on any node.

the rate of eviction is normally 0, but goes up above that eventhough
the fielddata cache size is nowhere near 10GB

Attached is a plot of the max(indices.fielddata.memory_size_in_bytes) (red
line) and sum(indices.fielddata.evictions) (green line) across all nodes in
the cluster. Note that we create a fresh new index every day that replaces
an older one (that explains the change in profile around midnight).

As you can see, the size (on any given node) never exceeds 3.6GB, yet even
at a lower value (around 2.2GB), some nodes start evicting entries from the
cache. Also, starting around Tue 8AM, the max(field cache size) becomes
erratic and jumps up and down.

I can't explain this behaviour, especially since we've been operating for
a while at this volume and rate of documents. This was not happening
before. Though it's possible that we're getting a higher volume of data, it
doesn't look substantially different from the past.

Under what circumstances will an ES node evict entries from it's field
data cache? We're also deleting documents from the index, can this have an
impact? What other things should I be looking it to find a correlation (GC
time does not seem to be correlated)?

Thanks,
Philippe

On Friday, September 12, 2014 9:33:16 AM UTC-4, Philippe Laflamme wrote:

Forgot to mention that we're using ES 1.1.1

On Friday, September 12, 2014 9:21:23 AM UTC-4, Philippe Laflamme wrote:

Hi,

I have a cluster with nodes configured with a 18G heap. We've noticed a
degradation in performance recently after increasing the volume of data
we're indexing.

I think the issue is due to the field data cache doing eviction. Some
nodes are doing lots of them, some aren't doing any. This is explained by
our routing strategy which results in non-uniform document distribution.
Maybe we can improve this eventually, but in the meantime, I'm trying to
understand why the nodes are evicting cached data.

The metrics show that the field data cache is only ~1.5GB in size, yet
we have this in our elasticsearch.yml:

indices.fielddata.cache.size: 10gb

Why would a node evict cache entries when it should still have plenty of
room to store more? Are we missing another setting? Is there a way to tell
what the actual fielddata cache size is at runtime (maybe it did not pickup
the configuration setting for some reason)?

Thanks,
Philippe

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b12d4d45-94ab-4110-831a-0abd8a651a9b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Topic		Replies	Views
Unnecessary Cache Eviction Explained Elasticsearch	10	760	July 6, 2017
Request updates to ES documentation page "limiting memory usage" Elasticsearch	1	363	July 6, 2017
GC failing to reduce heap memory usage Elasticsearch	10	813	July 6, 2017
Field cache limits ignored Elasticsearch	9	463	July 6, 2017
ES OOMing and not triggering cache circuit breakers, using LocalManualCache Elasticsearch	6	1141	July 6, 2017

Field Data Cache Size and Eviction

Related topics