ElasticSearch memory usage on centralized log clusters

Hi all,

We've been running an ElasticSearch cluster of three nodes since last
December. We're running them on Debian Wheezy. Due to the size of our
network, we're getting about 600 messages/s (800 at peak times). Using
logstash and daily indices, we're currently at about 3TB of data spread
over 158 indices. We use 2 replicas, so all machines have all data.

I'm struggling to understand what expected working limits are for
ElasticSearch in a centralized log situation and would really appreciate
input from others. What we eventually try to do is provide about 13 months
of searchable logs via Kibana, but we're mostly running into RAM
constraints. Each node has 3TB of data, but the heap usage is almost
constantly up at 27-29GB.

We did have a problem with garbage collects earlier, which took so long
that a node would drop from the cluster. To fix this, we switched to the G1
gc, which seems very well suited for this operation. The machines we're
using have 12 cores, so the added CPU overhead is negligible (general usage
of the cores is less than 30%). But I'm not enough of a Java dev to judge
if this switch could be the cause of the constant high heap usage. We're
currently at about 1GB RAM per 100GB of disk usage.

My questions:

1- Is 1GB RAM usage per 100GB disk usage an expected usage pattern for an
index heavy cluster?
2- Aside from closing indices, are there other ways of lowering this?
2.5- Should I worry about it?
3- Are we approaching this the wrong way and should we change our setup?
4- Would upgrading to 1.3.1 change the usage significantly, due to the fix
in issue 6856 or is it unrelated?

The numbers:

3 nodes with the following config:

  • 92GB of RAM
  • 12 cores with HT
  • 6x 3.6TB spinning disks (we're contemplating adding CacheCade, since the
    controller supports it)
    • We expose the disks to elasticsearch via 6 mount points
  • Debian Wheezy
  • Java 1.7.0_65 OpenJDK JRE with IcedTea 2.5.1 (Debian package with version
    7u65-2.5.1-2~deb7u1)
  • ElasticSearch 1.2.1 from the ElasticSearch repositories

Config snippets (leaving out ips and the like):

bootstrap:
mlockall: true
cluster:
name: logging
routing:
allocation:
node_concurrent_recoveries: 4
discovery:
zen:
minimum_master_nodes: 2
ping:
unicast:
hosts:
[snip]
index:
number_of_replicas: 2
number_of_shards: 6
indices:
memory:
index_buffer_size: 50%
recovery:
concurrent_streams: 5
max_bytes_per_sec: 100mb
node:
concurrent_recoveries: 6
name: stor1-stor
path:
data:

  • /srv/elasticsearch/a
  • /srv/elasticsearch/b
  • /srv/elasticsearch/c
  • /srv/elasticsearch/d
  • /srv/elasticsearch/f
  • /srv/elasticsearch/e

Java options: -Xms30g -Xmx30g -Xss256k -Djava.awt.headless=true
-XX:+UseG1GC -XX:+HeapDumpOnOutOfMemoryError

Thanks for your help! Please let me know if you need more information

--
Kind regards,
Tim Stoop

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2846daa5-a75c-4d04-ab34-957984c9e05e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

A lot of the answers for performance and capacity are "it depends".
You'll get much better performance from Oracle java, 1.7u55 is current
recommended. Given you're "experimenting" with GCG1 (which isn't current
best practise, hence experimenting), you might even want to try Java 1.8.

If you want to drop memory use you can disable bloom filtering.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 31 July 2014 19:16, Tim Stoop tim.stoop@gmail.com wrote:

Hi all,

We've been running an Elasticsearch cluster of three nodes since last
December. We're running them on Debian Wheezy. Due to the size of our
network, we're getting about 600 messages/s (800 at peak times). Using
logstash and daily indices, we're currently at about 3TB of data spread
over 158 indices. We use 2 replicas, so all machines have all data.

I'm struggling to understand what expected working limits are for
Elasticsearch in a centralized log situation and would really appreciate
input from others. What we eventually try to do is provide about 13 months
of searchable logs via Kibana, but we're mostly running into RAM
constraints. Each node has 3TB of data, but the heap usage is almost
constantly up at 27-29GB.

We did have a problem with garbage collects earlier, which took so long
that a node would drop from the cluster. To fix this, we switched to the G1
gc, which seems very well suited for this operation. The machines we're
using have 12 cores, so the added CPU overhead is negligible (general usage
of the cores is less than 30%). But I'm not enough of a Java dev to judge
if this switch could be the cause of the constant high heap usage. We're
currently at about 1GB RAM per 100GB of disk usage.

My questions:

1- Is 1GB RAM usage per 100GB disk usage an expected usage pattern for an
index heavy cluster?
2- Aside from closing indices, are there other ways of lowering this?
2.5- Should I worry about it?
3- Are we approaching this the wrong way and should we change our setup?
4- Would upgrading to 1.3.1 change the usage significantly, due to the fix
in issue 6856 or is it unrelated?

The numbers:

3 nodes with the following config:

  • 92GB of RAM
  • 12 cores with HT
  • 6x 3.6TB spinning disks (we're contemplating adding CacheCade, since the
    controller supports it)
    • We expose the disks to elasticsearch via 6 mount points
  • Debian Wheezy
  • Java 1.7.0_65 OpenJDK JRE with IcedTea 2.5.1 (Debian package with
    version 7u65-2.5.1-2~deb7u1)
  • Elasticsearch 1.2.1 from the Elasticsearch repositories

Config snippets (leaving out ips and the like):

bootstrap:
mlockall: true
cluster:
name: logging
routing:
allocation:
node_concurrent_recoveries: 4
discovery:
zen:
minimum_master_nodes: 2
ping:
unicast:
hosts:
[snip]
index:
number_of_replicas: 2
number_of_shards: 6
indices:
memory:
index_buffer_size: 50%
recovery:
concurrent_streams: 5
max_bytes_per_sec: 100mb
node:
concurrent_recoveries: 6
name: stor1-stor
path:
data:

  • /srv/elasticsearch/a
  • /srv/elasticsearch/b
  • /srv/elasticsearch/c
  • /srv/elasticsearch/d
  • /srv/elasticsearch/f
  • /srv/elasticsearch/e

Java options: -Xms30g -Xmx30g -Xss256k -Djava.awt.headless=true
-XX:+UseG1GC -XX:+HeapDumpOnOutOfMemoryError

Thanks for your help! Please let me know if you need more information

--
Kind regards,
Tim Stoop

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/2846daa5-a75c-4d04-ab34-957984c9e05e%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/2846daa5-a75c-4d04-ab34-957984c9e05e%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624aGOa%2B_EmSFkQgXZSv_gcyT%2BS1rO_XnpcZoaLDku4iruA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Hi Mark,

Thanks for your reply!

Op donderdag 31 juli 2014 12:16:36 UTC+2 schreef Mark Walkom:

A lot of the answers for performance and capacity are "it depends".

Well, I'm currently not that worried about performance, as long as it can
keep up with the amount of data we throw at it. It's currently 800
messages/s at peak times, but we expect to grow to about 2000. Beyond that,
as long as searches do not time out, I'd be happy. I'm much more worried
about stability at the moment, hence my question regarding if I should be
worried about the memory usage I'm seeing.

You'll get much better performance from Oracle java, 1.7u55 is current
recommended. Given you're "experimenting" with GCG1 (which isn't current
best practise, hence experimenting), you might even want to try Java 1.8.

Ok, will try the Oracle JRE. Regarding G1 being experimental, I assume you
mean from ES' POV, right? Because from what I read, it's fully supported in
Java 7. I didn't find any other way to solve the 'stop the world' gc the
CMS ran into every few hours :S I'm not a Java dev, however, just wanted
something that didn't crash once a day.

If you want to drop memory use you can disable bloom filtering.

Done and that did indeed help a little.

--
Kind regards,
Tim Stoop

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/df14168b-3615-450e-9a8f-4360c811d7d0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

GCG1 is experimental in that it's not recommended by the ES team as you
guessed, even if it is supported within java.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 31 July 2014 21:30, Tim Stoop tim.stoop@gmail.com wrote:

Hi Mark,

Thanks for your reply!

Op donderdag 31 juli 2014 12:16:36 UTC+2 schreef Mark Walkom:

A lot of the answers for performance and capacity are "it depends".

Well, I'm currently not that worried about performance, as long as it can
keep up with the amount of data we throw at it. It's currently 800
messages/s at peak times, but we expect to grow to about 2000. Beyond that,
as long as searches do not time out, I'd be happy. I'm much more worried
about stability at the moment, hence my question regarding if I should be
worried about the memory usage I'm seeing.

You'll get much better performance from Oracle java, 1.7u55 is current
recommended. Given you're "experimenting" with GCG1 (which isn't current
best practise, hence experimenting), you might even want to try Java 1.8.

Ok, will try the Oracle JRE. Regarding G1 being experimental, I assume you
mean from ES' POV, right? Because from what I read, it's fully supported in
Java 7. I didn't find any other way to solve the 'stop the world' gc the
CMS ran into every few hours :S I'm not a Java dev, however, just wanted
something that didn't crash once a day.

If you want to drop memory use you can disable bloom filtering.

Done and that did indeed help a little.

--
Kind regards,
Tim Stoop

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/df14168b-3615-450e-9a8f-4360c811d7d0%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/df14168b-3615-450e-9a8f-4360c811d7d0%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624aisD-1uRrKY-V_ox2gBah2GKqC21YEfbGZvm5V4n6nfQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.