Memory Issues with our cluser set up

Hi all

We've noticed that a couple of our data nodes have been having memory
issues eventually leading to them dropping out of the cluster. Other nodes
in the cluster are nowhere near their limits in terms of memory usage.
Looking at the gc logs of those problem boxes it seems like it reaches a
point after some time ( about 3 days ) where the nodes start garbage
collecting every 30 seconds. We also see some out of memory errors in the
main log file and we think this may be due to when new logs are created at
midnight for the start of the new day - however we are still in the process
of confirming this.

I'm sure it's some configuration issues but not too sure where to even look
to figure out what needs tweaking. I'm going to describe our set up below
in the hopes that you may know how to help.

  • Deploying to AWS
  • Using Elasticsearch 1.1.0 and the equivalent aws-elasticsearch-plugin.
  • Using Java 7 on centos6
  • 6 Data nodes are m1.large
  • 3 Master nodes m1.medium
  • All nodes have ES_HEAP_SIZE = 5g
  • All nodes have MAX_OPEN_FILES=65535
  • 3 shards per index
  • 1 replica
  • ttl is set globally to 30 days.
  • Flume is used to push log data into the cluster.

The big one is the number of indices. We are using it for our application
logging data. We have multiple applications and we write each application
log for each day into a separate index. So there will be new applications
that will fire up and send its logs to our cluster over time. It's similar
to the logstash set up, the difference being each application writes to
it's own index on a daily basis not one big global one on a daily basis if
that makes sense. Currently we have about 250 indices.

One of the things we have been considering is scaling up to deal with the
issue as we are on AWS. However we would like to understand how
elasticsearch distributes the load amongst the data nodes as during scaling
up we would like to distribute amongst the data nodes based on load, else
the scaling up may not have a significant effect.

Any help would be appreciated.

Dip

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/580e6a37-aa00-41c1-88cd-6f54865eccdb%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Document TTL can be a resource hog, you might want to look at logstash's
curator instead to manage expiration.
It might also be beneficial to increase your shard count so you have one
shard per node, alternatively look
at routing.allocation.awareness.attributes.

ES will distribute things as evenly as possible based on the shard count,
however sometimes that doesn't work; eg we have one node now with >20GB of
500GB disk free, despite other nodes having more free space.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 17 April 2014 00:13, Dipesh Patel dipthegeezer.opensource@gmail.comwrote:

Hi all

We've noticed that a couple of our data nodes have been having memory
issues eventually leading to them dropping out of the cluster. Other nodes
in the cluster are nowhere near their limits in terms of memory usage.
Looking at the gc logs of those problem boxes it seems like it reaches a
point after some time ( about 3 days ) where the nodes start garbage
collecting every 30 seconds. We also see some out of memory errors in the
main log file and we think this may be due to when new logs are created at
midnight for the start of the new day - however we are still in the process
of confirming this.

I'm sure it's some configuration issues but not too sure where to even
look to figure out what needs tweaking. I'm going to describe our set up
below in the hopes that you may know how to help.

  • Deploying to AWS
  • Using Elasticsearch 1.1.0 and the equivalent aws-elasticsearch-plugin.
  • Using Java 7 on centos6
  • 6 Data nodes are m1.large
  • 3 Master nodes m1.medium
  • All nodes have ES_HEAP_SIZE = 5g
  • All nodes have MAX_OPEN_FILES=65535
  • 3 shards per index
  • 1 replica
  • ttl is set globally to 30 days.
  • Flume is used to push log data into the cluster.

The big one is the number of indices. We are using it for our application
logging data. We have multiple applications and we write each application
log for each day into a separate index. So there will be new applications
that will fire up and send its logs to our cluster over time. It's similar
to the logstash set up, the difference being each application writes to
it's own index on a daily basis not one big global one on a daily basis if
that makes sense. Currently we have about 250 indices.

One of the things we have been considering is scaling up to deal with the
issue as we are on AWS. However we would like to understand how
elasticsearch distributes the load amongst the data nodes as during scaling
up we would like to distribute amongst the data nodes based on load, else
the scaling up may not have a significant effect.

Any help would be appreciated.

Dip

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/580e6a37-aa00-41c1-88cd-6f54865eccdb%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/580e6a37-aa00-41c1-88cd-6f54865eccdb%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624Yco9G8E%3DNXuVorL8NL0bkJ3WfPL2a6DeZbD_yDSQzQCA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Mark,thanks for your response. We hit the same problem last night ( before
making any of your suggested changes ). Thankfully this time we did a whole
load of analysis that may be of use.

We had one data node that got a java heap size error and left the pool of
our elasticsearch cluster. We looked at this one node and found a few
things.

  1. It had a heavy load prior to it becoming unusable.
  2. It started to garbage collect often and frequently.
  3. The problematic node had high cpu and garbage collected every 30 seconds
    or so for 7 hours before we finally saw a heap size error in our logs, at
    the point the node was unresponsive and it left the cluster.
  4. The node contained shards of our largest indices.

The interesting and worrying thing was that during this 7 hour period the
elasticsearch cluster itself was unusable ( not accepting any reads) and
only recovered once the node had left. We think that the cluster should
have recovered sooner perhaps and not got into such a state. Anyone else
seeing similar issues?

Dip

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/39b418ac-bc87-4d6d-ad92-ca85da79fdbf%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Also here is what we are seeing in our logs. If it's an issue I'm more then
happy to raise a ticket in github:

From the master elasticsearch log:

[2014-04-16 18:57:17,298][WARN ][transport ] [Byrrah] Received response
for a request that has timed out, sent [31116ms] ago, timed out [1115ms]ago, action
[discovery/zen/fd/ping], node [[Aleta Ogord][EvVSzG4lQoq9A4gpCaVPsw]
[ip-xxxxx.compute.internal][inet[ip-xxxxxx.compute.internal/xxxxx:9300]]
[master=false]], id [38161759]

[2014-04-17 00:00:53,471][WARN ][transport ] [Byrrah] Received response
for a request that has timed out, sent [32850ms] ago, timed out [2850ms]ago, action
[discovery/zen/fd/ping], node [[Aleta Ogord][EvVSzG4lQoq9A4gpCaVPsw]
[ip-xxxxxx.compute.internal][inet[ip-xxxxx.compute.internal/xxxxxx:9300]]
[master=false]], id [38401500]

The above was repeated consistently until this:

[2014-04-17 07:07:21,179][INFO ][cluster.service ] [Byrrah] removed
{[Aleta
Ogord][EvVSzG4lQoq9A4gpCaVPsw][ip-xxxxxx.compute.internal][inet[ip-xxxxxx.compute.internal/xxxxxx:9300]]{master=false},},
reason: zen-disco-node_failed([Aleta Ogord][EvVSzG4lQoq9A4gpCaVPsw]
[ip-10-0-105-89.eu-west-1.compute.internal][inet
[ip-xxxxxx.compute.internal/xxxxxx:9300]][master=false]), reason failed to
ping, tried [3] times, each with maximum [30s] timeout

On the data node we saw this logs:

[monitor.jvm ] [Aleta Ogord] [gc][old][31491][1023] duration
[20s], collections [1]/[20.3s], total [20s]/[4.6h], memory
[4.8gb]->[4.8gb]/[4.9gb], all_pools {[young]
[25.4mb]->[14.6mb]/[133.1mb]}{[survivor] [0b]->[0b]/[16.6mb]}{[old]
[4.8gb]->[4.8gb]/[4.8gb]}

Until finally we got this:

[2014-04-17 08:42:53,630][WARN ][indices.ttl ] [Aleta Ogord] failed to
execute ttl purge
java.lang.OutOfMemoryError: Java heap space
[2014-04-17 08:52:42,761][WARN ][index.engine.internal ] [Aleta Ogord]
[pyxis-aggregator-2014-04-17][1] failed engine
java.lang.OutOfMemoryError: Java heap space

Throughout this entire period the cluster was in green state but was not
usable.

Thanks

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b4739458-0a85-405e-ae92-9543e55cfddb%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.