Apparent memory leak after a few days of heavy indexing

I have a two node cluster. Each node runs with a heap size 16GB. The nodes
run on their own boxes and have plenty of CPU to themselves. Currently, the
cluster's only workload is indexing at pretty high volume. We are indexing
using the bulk index API, and are sending about 10 batches of 400 documents
per second. We're using the Java client, specifically TransportClient.

Things work well for a little while (1-2 days), but eventually, the cluster
falls over -- see the heap usage chart from Graphite. This is for just one
host, but the memory behavior is identical across the two nodes.

[image: Inline image 1]

This looks like a memory leak to me. Logs don't reveal anything out of the
ordinary happened when heap usage started increasing linearly. A few common
sources of memory issues that I've already ruled out:

  • The cache. Since the workload is basically entirely indexing, the
    filter and field caches shouldn't even come into play here. Either way,
    both are configured to be limited in size, so this can't be it.
  • "Not enough heap" -- I can reproduce this no matter how much heap
    space I give my nodes.

Anyone seen this before, or any ideas as to what my issue might be here?
I've turned on TRACE logging for Lucene merges, but I don't think I'll get
any good data out of that until my cluster crashes again.

Rafe

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

You don't tell us the ES version and the JVM heap memory you have
configured, so I assume standard settings? (Image is not viewable).

I recommend increasing the heap, and also the following settings for
smoother indexing (less peak resource intensive)

index.merge.policy.max_merged_segment: 2g
index.merge.policy.segments_per_tier: 24
index.merge.policy.max_merge_at_once: 8

And for better fail safety, consider a 3 node cluster.

Jörg

Am 18.07.13 01:57, schrieb rafe@squareup.com:

I have a two node cluster. Each node runs with a heap size 16GB. The
nodes run on their own boxes and have plenty of CPU to themselves.
Currently, the cluster's only workload is indexing at pretty high
volume. We are indexing using the bulk index API, and are sending
about 10 batches of 400 documents per second. We're using the Java
client, specifically TransportClient.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Can you open an issue and share an example so we can recreate it and try
and chase it up?

On Thu, Jul 18, 2013 at 1:57 AM, rafe@squareup.com wrote:

I have a two node cluster. Each node runs with a heap size 16GB. The nodes
run on their own boxes and have plenty of CPU to themselves. Currently, the
cluster's only workload is indexing at pretty high volume. We are indexing
using the bulk index API, and are sending about 10 batches of 400 documents
per second. We're using the Java client, specifically TransportClient.

Things work well for a little while (1-2 days), but eventually, the
cluster falls over -- see the heap usage chart from Graphite. This is for
just one host, but the memory behavior is identical across the two nodes.

[image: Inline image 1]

This looks like a memory leak to me. Logs don't reveal anything out of the
ordinary happened when heap usage started increasing linearly. A few common
sources of memory issues that I've already ruled out:

  • The cache. Since the workload is basically entirely indexing, the
    filter and field caches shouldn't even come into play here. Either way,
    both are configured to be limited in size, so this can't be it.
  • "Not enough heap" -- I can reproduce this no matter how much heap
    space I give my nodes.

Anyone seen this before, or any ideas as to what my issue might be here?
I've turned on TRACE logging for Lucene merges, but I don't think I'll get
any good data out of that until my cluster crashes again.

Rafe

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Sorry for the incomplete information. I am using ES 0.90.1 and I am using
default JVM memory settings, with the exception of -Xmx16g and -Xms16g. The
chart I tried to attach before is at http://i.imgur.com/vYeDEsc.png,
hopefully the link I gave won't be stripped out as well.

Jörg, could merging cause the types of pauses and memory usage I am seeing?
A few hours and like 6GB seems unreasonable for a merge, and also seems
really unlikely based on my experience with Lucene.

Shay, I will work on submitting a ticket.

Rafe

On Thu, Jul 18, 2013 at 12:03 AM, Jörg Prante joergprante@gmail.com wrote:

You don't tell us the ES version and the JVM heap memory you have
configured, so I assume standard settings? (Image is not viewable).

I recommend increasing the heap, and also the following settings for
smoother indexing (less peak resource intensive)

index.merge.policy.max_merged_**segment: 2g
index.merge.policy.segments_**per_tier: 24
index.merge.policy.max_merge_**at_once: 8

And for better fail safety, consider a 3 node cluster.

Jörg

Am 18.07.13 01:57, schrieb rafe@squareup.com:

I have a two node cluster. Each node runs with a heap size 16GB. The

nodes run on their own boxes and have plenty of CPU to themselves.
Currently, the cluster's only workload is indexing at pretty high volume.
We are indexing using the bulk index API, and are sending about 10 batches
of 400 documents per second. We're using the Java client, specifically
TransportClient.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/**
topic/elasticsearch/cS-**XVa6jEmU/unsubscribehttps://groups.google.com/d/topic/elasticsearch/cS-XVa6jEmU/unsubscribe
.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@**googlegroups.comelasticsearch%2Bunsubscribe@googlegroups.com
.
For more options, visit https://groups.google.com/**groups/opt_outhttps://groups.google.com/groups/opt_out
.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Rafe, just curious…what size are your indexes? Do you cap them and create a new index once the size is reached? Or are you indexing into a single index?
-Vinh

On Jul 18, 2013, at 9:32 AM, Rafe Kettler rafe@squareup.com wrote:

Sorry for the incomplete information. I am using ES 0.90.1 and I am using default JVM memory settings, with the exception of -Xmx16g and -Xms16g. The chart I tried to attach before is at http://i.imgur.com/vYeDEsc.png, hopefully the link I gave won't be stripped out as well.

Jörg, could merging cause the types of pauses and memory usage I am seeing? A few hours and like 6GB seems unreasonable for a merge, and also seems really unlikely based on my experience with Lucene.

Shay, I will work on submitting a ticket.

Rafe

On Thu, Jul 18, 2013 at 12:03 AM, Jörg Prante joergprante@gmail.com wrote:
You don't tell us the ES version and the JVM heap memory you have configured, so I assume standard settings? (Image is not viewable).

I recommend increasing the heap, and also the following settings for smoother indexing (less peak resource intensive)

index.merge.policy.max_merged_segment: 2g
index.merge.policy.segments_per_tier: 24
index.merge.policy.max_merge_at_once: 8

And for better fail safety, consider a 3 node cluster.

Jörg

Am 18.07.13 01:57, schrieb rafe@squareup.com:

I have a two node cluster. Each node runs with a heap size 16GB. The nodes run on their own boxes and have plenty of CPU to themselves. Currently, the cluster's only workload is indexing at pretty high volume. We are indexing using the bulk index API, and are sending about 10 batches of 400 documents per second. We're using the Java client, specifically TransportClient.

--
You received this message because you are subscribed to a topic in the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/cS-XVa6jEmU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Vinh, I am creating one index per day's worth of data (so there is no size
threshold for creating a new index). My indices are ~500 million docs, 75GB
on disk (for the primary shards).

Rafe

On Thu, Jul 18, 2013 at 9:48 AM, vinh vinh@loggly.com wrote:

Rafe, just curious…what size are your indexes? Do you cap them and create
a new index once the size is reached? Or are you indexing into a single
index?
-Vinh

On Jul 18, 2013, at 9:32 AM, Rafe Kettler rafe@squareup.com wrote:

Sorry for the incomplete information. I am using ES 0.90.1 and I am using
default JVM memory settings, with the exception of -Xmx16g and -Xms16g. The
chart I tried to attach before is at http://i.imgur.com/vYeDEsc.png,
hopefully the link I gave won't be stripped out as well.

Jörg, could merging cause the types of pauses and memory usage I am
seeing? A few hours and like 6GB seems unreasonable for a merge, and also
seems really unlikely based on my experience with Lucene.

Shay, I will work on submitting a ticket.

Rafe

On Thu, Jul 18, 2013 at 12:03 AM, Jörg Prante joergprante@gmail.comwrote:

You don't tell us the ES version and the JVM heap memory you have
configured, so I assume standard settings? (Image is not viewable).

I recommend increasing the heap, and also the following settings for
smoother indexing (less peak resource intensive)

index.merge.policy.max_merged_**segment: 2g
index.merge.policy.segments_**per_tier: 24
index.merge.policy.max_merge_**at_once: 8

And for better fail safety, consider a 3 node cluster.

Jörg

Am 18.07.13 01:57, schrieb rafe@squareup.com:

I have a two node cluster. Each node runs with a heap size 16GB. The

nodes run on their own boxes and have plenty of CPU to themselves.
Currently, the cluster's only workload is indexing at pretty high volume.
We are indexing using the bulk index API, and are sending about 10 batches
of 400 documents per second. We're using the Java client, specifically
TransportClient.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/**
topic/elasticsearch/cS-**XVa6jEmU/unsubscribehttps://groups.google.com/d/topic/elasticsearch/cS-XVa6jEmU/unsubscribe
.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@**googlegroups.comelasticsearch%2Bunsubscribe@googlegroups.com
.
For more options, visit https://groups.google.com/**groups/opt_outhttps://groups.google.com/groups/opt_out
.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/cS-XVa6jEmU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.