Upgrade to ES 1.4 memory issues / tuning max_merged_segment

Hi all,

I have been testing an upgrade to elasticsearch 1.4 beta1.

We use the Bulk API along with scripts to perform upserts into
elasticsearch. These perform well under ES 1.2 without any tuning.

However, in ES 1.4 beta1, running these upsert scripts often lead to:
java.lang.OutOfMemoryError: Java heap space

We use the bulk API:

curl -iL -silent --show-error -XPOST 'localhost:9200/_bulk' --data-binary
@./<file_name>

where the file contains about 130 Mb ( 10,000 to 250,000 lines ) of data.
It is filled with update / script commands:

{"update":{"_index":"2762_2014_41","_type":"event","_id":"97bc142e15c7136ebe866890e03dfad9"}}
{"doc":

{"type":"event","date_time":"2014-10-17T19:00:00Z","day":20141017,"impression_cost":0.005,"format":"xyz","impression":1,"referer":"xyz","browser":"xyz","os":"android
4.4.4","device":"nexus
4","channel":"mobile","x_name":"xyz","id":"97bc142e15c7136ebe866890e03dfad9"
},"doc_as_upsert":true
}

{"update":{"_index":"2762_2014_41","_type":"event","_id":"97bc142e15c7136ebe866890e03dfad9"}}
{
"script":"if( ctx._source.containsKey("impression") ){
ctx._source.impression += 2; } else { ctx._source.impression = 2; };"
}

There were some issues with with permgen taking up memory in this ticket
that have been addressed since the beta1 release, so we re-built from the
1.4 branch:

And I found this discussion about an OOM error that suggested including the
max_merged_segment in elasticsearch.yml.
https://groups.google.com/forum/?fromgroups#!searchin/elasticsearch/max_merged_segment/elasticsearch/ETjvBVUvCJs/ZccfzUIFAKoJ

index.merge.policy.max_merged_segment: 1gb

Setting max_merged_segment, launching on my development machine with a 2gb:
ES_HEAP_SIZE=2g ./bin/elasticsearch, and bringing down the file size
per-bulk request to about 25Mb stablilzed the system.
However, it would still heap dump when larger files like 130Mb were allowed.

I don't fully understand how this fixed the memory issues. Would anyone be
able to provide some insight into why we would run into memory issues with
the upgrade?
I'd like to better understand how the memory is managed here so that I can
support this in production. Are there recommended sizes for bulk requests?
And how those related to the max_merged_segment size?

Thanks,
Dave

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d5845815-eb21-41c0-b899-96626dce577e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

There are several areas of memory Elasticsearch is using when receiving
large bulks over HTTP:

  • Netty buffers (HTTP chunking etc.)

  • bulk source (the lines are split into portions for each primary shard)

  • memory for analyzing/tokenizing the fields in the source

  • translog buffer (ES write ahead logging)

  • indexing buffer (Lucene NRT etc.)

The longer the bulk runs, the more competition is for the 2g heap.

If you run sustaining bulk requests for some time (say 15 - 20 minutes), ES
picks up the created segments on disk and merges the segments to larger
ones to keep the performance.

Reducing the default of 5g to 1g for max_merge_segments has two effects. It
allows for faster completion of a merge step because the volume of merge
segment is limited, and it takes off some of the pressure on the heap when
segments grow larger and larger. The downside is that merge steps are
executed more frequently.

You are correct, bulk requests around 1-10MB should work ok for most of the
servers.

Bulk requests of 100MB and larger have strong effects on the run time and
the memory consumption for the other ES processing steps which are
necessary to index the data, and should be reduced in order to find a
"sweet spot" - the exact point of the optimal balance between bulk request
input and indexing power depends also on other factors, like I/O throughput
and CPU (plus ES settings like store throttling).

Jörg

On Tue, Oct 28, 2014 at 4:41 PM, dzaebst@runads.com wrote:

Hi all,

I have been testing an upgrade to elasticsearch 1.4 beta1.

We use the Bulk API along with scripts to perform upserts into
elasticsearch. These perform well under ES 1.2 without any tuning.

However, in ES 1.4 beta1, running these upsert scripts often lead to:
java.lang.OutOfMemoryError: Java heap space

We use the bulk API:

curl -iL -silent --show-error -XPOST 'localhost:9200/_bulk'
--data-binary @./<file_name>

where the file contains about 130 Mb ( 10,000 to 250,000 lines ) of data.
It is filled with update / script commands:

{"update":{"_index":"2762_2014_41","_type":"event","_id":"97bc142e15c7136ebe866890e03dfad9"}}
{"doc":

{"type":"event","date_time":"2014-10-17T19:00:00Z","day":20141017,"impression_cost":0.005,"format":"xyz","impression":1,"referer":"xyz","browser":"xyz","os":"android
4.4.4","device":"nexus
4","channel":"mobile","x_name":"xyz","id":"97bc142e15c7136ebe866890e03dfad9"
},"doc_as_upsert":true
}

{"update":{"_index":"2762_2014_41","_type":"event","_id":"97bc142e15c7136ebe866890e03dfad9"}}
{
"script":"if( ctx._source.containsKey("impression") ){
ctx._source.impression += 2; } else { ctx._source.impression = 2; };"
}

There were some issues with with permgen taking up memory in this ticket
that have been addressed since the beta1 release, so we re-built from the
1.4 branch:
Reduce permgen use from Groovy scripts · Issue #7658 · elastic/elasticsearch · GitHub

And I found this discussion about an OOM error that suggested including
the max_merged_segment in elasticsearch.yml.

Redirecting to Google Groups

index.merge.policy.max_merged_segment: 1gb

Setting max_merged_segment, launching on my development machine with a
2gb: ES_HEAP_SIZE=2g ./bin/elasticsearch, and bringing down the file size
per-bulk request to about 25Mb stablilzed the system.
However, it would still heap dump when larger files like 130Mb were
allowed.

I don't fully understand how this fixed the memory issues. Would anyone
be able to provide some insight into why we would run into memory issues
with the upgrade?
I'd like to better understand how the memory is managed here so that I can
support this in production. Are there recommended sizes for bulk
requests? And how those related to the max_merged_segment size?

Thanks,
Dave

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/d5845815-eb21-41c0-b899-96626dce577e%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/d5845815-eb21-41c0-b899-96626dce577e%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGqFR_QSBMKiynb%2BpbLKh-VvEoGzj8iJiHv5VL41QKZDA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.