Field Data Cache Size and Eviction

Hi,

I have a cluster with nodes configured with a 18G heap. We've noticed a
degradation in performance recently after increasing the volume of data
we're indexing.

I think the issue is due to the field data cache doing eviction. Some nodes
are doing lots of them, some aren't doing any. This is explained by our
routing strategy which results in non-uniform document distribution. Maybe
we can improve this eventually, but in the meantime, I'm trying to
understand why the nodes are evicting cached data.

The metrics show that the field data cache is only ~1.5GB in size, yet we
have this in our elasticsearch.yml:

indices.fielddata.cache.size: 10gb

Why would a node evict cache entries when it should still have plenty of
room to store more? Are we missing another setting? Is there a way to tell
what the actual fielddata cache size is at runtime (maybe it did not pickup
the configuration setting for some reason)?

Thanks,
Philippe

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/e619f974-1632-4694-a0f9-40c32100c504%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Forgot to mention that we're using ES 1.1.1

On Friday, September 12, 2014 9:21:23 AM UTC-4, Philippe Laflamme wrote:

Hi,

I have a cluster with nodes configured with a 18G heap. We've noticed a
degradation in performance recently after increasing the volume of data
we're indexing.

I think the issue is due to the field data cache doing eviction. Some
nodes are doing lots of them, some aren't doing any. This is explained by
our routing strategy which results in non-uniform document distribution.
Maybe we can improve this eventually, but in the meantime, I'm trying to
understand why the nodes are evicting cached data.

The metrics show that the field data cache is only ~1.5GB in size, yet we
have this in our elasticsearch.yml:

indices.fielddata.cache.size: 10gb

Why would a node evict cache entries when it should still have plenty of
room to store more? Are we missing another setting? Is there a way to tell
what the actual fielddata cache size is at runtime (maybe it did not pickup
the configuration setting for some reason)?

Thanks,
Philippe

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/512be87e-561a-4031-a465-d256ad400bbb%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Sorry for bumping this, but I'm a little stumped here.

We have some nodes that are evicting fielddata cache entries for seemingly
no reason:

  1. we've set indices.fielddata.cache.size to 10gb
  2. the metrics from the node stats endpoint show that the
    indices.fielddata.memory_size_in_bytes never exceeded 3.6GB on any node.
  3. the rate of eviction is normally 0, but goes up above that eventhough
    the fielddata cache size is nowhere near 10GB

Attached is a plot of the max(indices.fielddata.memory_size_in_bytes) (red
line) and sum(indices.fielddata.evictions) (green line) across all nodes in
the cluster. Note that we create a fresh new index every day that replaces
an older one (that explains the change in profile around midnight).

As you can see, the size (on any given node) never exceeds 3.6GB, yet even
at a lower value (around 2.2GB), some nodes start evicting entries from the
cache. Also, starting around Tue 8AM, the max(field cache size) becomes
erratic and jumps up and down.

I can't explain this behaviour, especially since we've been operating for a
while at this volume and rate of documents. This was not happening before.
Though it's possible that we're getting a higher volume of data, it doesn't
look substantially different from the past.

Under what circumstances will an ES node evict entries from it's field data
cache? We're also deleting documents from the index, can this have an
impact? What other things should I be looking it to find a correlation (GC
time does not seem to be correlated)?

Thanks,
Philippe

On Friday, September 12, 2014 9:33:16 AM UTC-4, Philippe Laflamme wrote:

Forgot to mention that we're using ES 1.1.1

On Friday, September 12, 2014 9:21:23 AM UTC-4, Philippe Laflamme wrote:

Hi,

I have a cluster with nodes configured with a 18G heap. We've noticed a
degradation in performance recently after increasing the volume of data
we're indexing.

I think the issue is due to the field data cache doing eviction. Some
nodes are doing lots of them, some aren't doing any. This is explained by
our routing strategy which results in non-uniform document distribution.
Maybe we can improve this eventually, but in the meantime, I'm trying to
understand why the nodes are evicting cached data.

The metrics show that the field data cache is only ~1.5GB in size, yet we
have this in our elasticsearch.yml:

indices.fielddata.cache.size: 10gb

Why would a node evict cache entries when it should still have plenty of
room to store more? Are we missing another setting? Is there a way to tell
what the actual fielddata cache size is at runtime (maybe it did not pickup
the configuration setting for some reason)?

Thanks,
Philippe

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/eafa1b0a-dbd6-4127-94d5-3733a3067bc7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

A bit late from the OP posted this, not sure if it is still relevant but
anyway...

Under what circumstances will an ES node evict entries from it's field
data cache? We're also deleting documents from the index, can this have an
impact? What other things should I be looking it to find a correlation (GC
time does not seem to be correlated)?

The cache implements an LRU eviction policy: when a cache becomes full,
the least recently used data is evicted to make way for new data.
http://www.elastic.co/guide/en/elasticsearch/reference/current/index-modules-cache.html

more information
here Monitoring Individual Nodes | Elasticsearch: The Definitive Guide [2.x] | Elastic

It's puzzling in your case when you set to 10GB for cache size but per node
usage only 3.6GB . Have you use the other api to check the cache if it is
also the case?
http://www.elastic.co/guide/en/elasticsearch/reference/current/index-modules-shard-query-cache.html#_monitoring_cache_usage

There are also a few additional links which might give you hints.

Hope it helps.

Jason

On Tuesday, September 16, 2014 at 10:25:08 PM UTC+8, Philippe Laflamme
wrote:

Sorry for bumping this, but I'm a little stumped here.

We have some nodes that are evicting fielddata cache entries for seemingly
no reason:

  1. we've set indices.fielddata.cache.size to 10gb
  2. the metrics from the node stats endpoint show that the
    indices.fielddata.memory_size_in_bytes never exceeded 3.6GB on any node.
  3. the rate of eviction is normally 0, but goes up above that eventhough
    the fielddata cache size is nowhere near 10GB

Attached is a plot of the max(indices.fielddata.memory_size_in_bytes) (red
line) and sum(indices.fielddata.evictions) (green line) across all nodes in
the cluster. Note that we create a fresh new index every day that replaces
an older one (that explains the change in profile around midnight).

As you can see, the size (on any given node) never exceeds 3.6GB, yet even
at a lower value (around 2.2GB), some nodes start evicting entries from the
cache. Also, starting around Tue 8AM, the max(field cache size) becomes
erratic and jumps up and down.

I can't explain this behaviour, especially since we've been operating for
a while at this volume and rate of documents. This was not happening
before. Though it's possible that we're getting a higher volume of data, it
doesn't look substantially different from the past.

Under what circumstances will an ES node evict entries from it's field
data cache? We're also deleting documents from the index, can this have an
impact? What other things should I be looking it to find a correlation (GC
time does not seem to be correlated)?

Thanks,
Philippe

On Friday, September 12, 2014 9:33:16 AM UTC-4, Philippe Laflamme wrote:

Forgot to mention that we're using ES 1.1.1

On Friday, September 12, 2014 9:21:23 AM UTC-4, Philippe Laflamme wrote:

Hi,

I have a cluster with nodes configured with a 18G heap. We've noticed a
degradation in performance recently after increasing the volume of data
we're indexing.

I think the issue is due to the field data cache doing eviction. Some
nodes are doing lots of them, some aren't doing any. This is explained by
our routing strategy which results in non-uniform document distribution.
Maybe we can improve this eventually, but in the meantime, I'm trying to
understand why the nodes are evicting cached data.

The metrics show that the field data cache is only ~1.5GB in size, yet
we have this in our elasticsearch.yml:

indices.fielddata.cache.size: 10gb

Why would a node evict cache entries when it should still have plenty of
room to store more? Are we missing another setting? Is there a way to tell
what the actual fielddata cache size is at runtime (maybe it did not pickup
the configuration setting for some reason)?

Thanks,
Philippe

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b12d4d45-94ab-4110-831a-0abd8a651a9b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.