Merging of segments results in java.lang.OutOfMemoryError: Java heap space

Hi @Srinath_C

I tend to recommend resetting everything to default because people have usually cargo-culted settings and removing them clears a lot of noise. Like I said, they're expert settings, but Mike is pretty much as expert as they get :slight_smile:

Also, +1 for actually testing the difference, instead of just blindly changing things. Be aware that the implementation of these things change from time to time, so I would revisit any settings that you have with later versions.

Did my other suggestions resolve the OOMs you were having?

Hi @Clinton_Gormley, the tests are still in progress. Will update back with my findings.

Actually, I haven't done a lot of testing combining consistent indexing activity along with query activity. Maybe your suggestions will make sense after having a lot of data and trying to query over them. I'm also trying with the noop queue scheduler. Will update back on that.

The OOMs didn't happen again - I'm attributing that to the reduced bulk sizes and concurrency levels. Hope that doesn't show up when I run the tests with queries.

Hi @mikemccand, @Clinton_Gormley,

On a new test setup, I'm unable to have a consistent performance from my cluster.
This cluster was setup with all the learnings from the previous runs - 3 master nodes (c3.large instances), 5 data nodes (m3.xlarge instances). After a ingesting documents at the rate of 4k per second for about 7 hours, we started to see issues of long GC pauses again. After that, we were never able to index records at that rate consistently.

Finally, the nodes ended up with heap dumps. I have captured suspected leaks using the memory analyzer tool. Please take a look at the reports here. Output of _nodes API is available here.

Just to summarize, the cluster is of version 1.7.1, 3 dedicated master nodes, 5 data nodes, 3 external NodeClient processes bulk inserting with concurrency 1 using BulkProcessor. Each bulk is configured to be at most 5Mb or 5000 documents or 10 seconds in batch.

I'm not sure what is missing in the new test. All configurations (elasticsearch.yml) seem to match the earlier setup where we had no issues.

Hi @Srinath_C

Could you upload a diagnostics dump somewhere? https://github.com/elastic/elasticsearch-support-diagnostics

thanks

Hi @Clinton_Gormley,

Hadn't got a chance to run these tests for quite some time. After looking into it further, it seems that with passage of time the bulk queue was increasing in size - probably due to asynchronous bulk insertions. The CPU utilisation though was <50% on each node. These requests in the bulk queues are probably causing frequent and long GC pauses.

Would that be a convincing assessment? I could upload the diagnostics output, but it'll need some scrubbing as it captures all aws key/secrets and IP addresses.

I've tried out synchronous bulk inserts but I'm facing high and variable response times, probably will bring it up on a different thread.

Thanks.

Very nice thread glad I read through it. @Clinton_Gormley comments on bulk explain some things I had seen in the past. Coming from SOLR I initially started with very large batches which worked well in that environment. However I found with ES that while bulk definitely improved things, there were certain measurable diminishing returns from large batch sizes and settled on a relatively small batch size of 10 documents. This has continued to work well, and the explanation above explains why.

One thought that I don't see mentioned here regards your EC2 instance size. I would recommend that you at least try a cluster of m3.2xlarge data nodes half the size of your current cluster, either 2 or 3, maybe both in separate tests. The reason for this is that the OS has some overhead requirements for memory, that don't increase with an increase in memory. Generally when sizing I assume this to be around 2GB with modern Linux. This may be high but has worked well for me. If you consider the ratio of 2GB to 15 GB with your m3.xlarge, that is 13%, where as in a m3.2xlarge that is 2 out of 30 or 6%. This means with m3.xlarge you sacrifice 13% of your memory with each new node whereas an m3.2xlarge you sacrifice only 6%, leaving more memory available to caching, and HEAP. With the smaller instance sizes, 8GB or less, I don't think you should allocate 50% to heap but rather closer to (8GB - 2GB)/2.

I recently made a similar shift myself, moving from c3.xlarge to m3.xlarge. The results more than justified the additional expense. My cluster acts twice the size, I look forward to being able to justify the move to m3.2xlarge.

Thanks for the inputs @aaronmefford, I'll see if I can quickly run through those tests.

Couple of questions:
a, Can you provide more details on your bulk sizes? size, frequency, synchronous/asynchronous, etc. I'm facing some challenges on the response times from bulk indexing.
b, m3.2xlarge is 8 core whereas m3.xlarge has 4. Do you think the number of cores made the difference or was it convincing that the extra heap space is the truly differentiating factor?
c, Were you able to figure out what goes into the 2GB of space? It would appear that instance types like c3.large (4G memory) instances won't even seem like a good choice for elasticsearch.

Couple of questions:
a, Can you provide more details on your bulk sizes? size, frequency,
synchronous/asynchronous, etc. I'm facing some challenges on the response
times from bulk indexing
https://discuss.elastic.co/t/bulk-import-response-times/27918.

Sorry I don't currently have solid numbers on response times in this area.
My indexing is the Reduce phase of a Map/Reduce job. So it is parallel,
but I limit the number of reducers to 1 or 2 per node in the cluster. I
was looking at the code today and it is currently configured at 100
messages per batch. It was a few years ago that I did substantial testing
of batch size and found the diminishing returns with larger batches.

b, m3.2xlarge is 8 core whereas m3.xlarge has 4. Do you think the number of
cores made the difference or was it convincing that the extra heap space is
the truly differentiating factor?

In my case I moved from c3.xlarge to m3.xlarge, so memory was the real
difference. The cluster size did not change. In your case I think you
achieve a similar result by reducing the number of nodes when you step up
to the m3.2xlarge or m4.2xlarge. This will keep your costs and cores
similar but double your RAM per node, improving efficiency of the RAM.

c, Were you able to figure out what goes into the 2GB of space? It would
appear that instance types like c3.large (4G memory) instances won't even
seem like a good choice for elasticsearch.

The 2GB is just a rule of thumb that I have used as a buffer. It is
loosely based on other best practices I have encountered over the years but
more so just on the reality that the OS needs some resources too, and if
you don't leave enough for the OS, then you will have issues. Also,
remember that anything the OS doesn't use directly it will use for file
caching which significantly benefits ElasticSearch as well, so you arent
really loosing anything by leaving this buffer. I have run ElasticSearch
on smaller instances, it works for small projects, with small
requirements. But yes I would strongly recommend against anything under
8GB, especially if you have more than 2 nodes. I.E. scale up a bit before
scaling out. A 16 node cluster of m3.mediums is nonsense. In my
consulting practice I strongly recommend targeting 64GB per node, and
scaling up instead of out until you reach that size. I really wish that
AWS wasn't so stingy with memory, but that is a common challenge in all
virtualized environments.