High heap during indexing documents

szarik · February 20, 2017, 10:46am

Hello,

We are using Elasticsearch to storage messages from our internal messanger. Currently we have around 7 billions messages to import to Elasticsearch.
During import we collect additional data from our internal API and prepare bulk request to ES.

What we have:

Elastisearch 2.4.1
1 cluster with 8 nodes where one is master and other are master-eligible nodes:

virtualization (we do not use bare metal)

```
JVM 1.8.0_121 (OpenJDK)
```

64 GB RAM where 31 GB RAM is set for JVM,

```
1.1 TB SSD storage
```
```
1GHZ 16 core proc
```
```
8 shards + 1 replica
```

We are working with month indicies and currently we need around 140 indicies for messages. We also keep additional data like user informations in 3 additional indicies.

Where is the problem:

We have problem with heap during importing lots of data. Currently we put around 8000 messages per seconds to Elasticseach using multiple workers. Single worker add 1000 messages in one bulk request (aroung 500kB in size).

At the beginning everything looks good, after few hours when we have more than 10 milion messages (around 9GB on storage including replicas), heap on every node increase to 90% - 96% and keep this high level.

We test many settings

change garbage collector to g1gc - heap works nice, but CPU usage increase dramatically, JVM stop responding, one random node vanish from cluster and everything hang
change threadpools - same results like with garbage collector
swapiness set to 5 - small improvement but now enough
we achive better result when we leave _id generation for ES instead of using own int or uuid4

We still see logs like during search and insert:

Caused by: java.lang.OutOfMemoryError: Java heap space
[2017-02-09 00:49:56,690][DEBUG][action.search            ] [archive-es0] [archive_2015-06][2], node[DDLpT-CvQMukscvKIORzUw], [P], v[32], s[STARTED], a[id=lI_Pm7z5RAiOfkj-hUIIiQ]: Failed to execute [org.elasticsearch.action.search.SearchRequest@2d30088b] lastShard [true]
RemoteTransportException[[archive-es0][10.114.1.48:9300][indices:data/read/search[phase/query]]]; nested: ElasticsearchException[java.lang.OutOfMemoryError: Java heap space]; nested: ExecutionError[java.lang.OutOfMemoryError: Java heap space]; nested: OutOfMemoryError[Java heap space];
Caused by: ElasticsearchException[java.lang.OutOfMemoryError: Java heap space]; nested: ExecutionError[java.lang.OutOfMemoryError: Java heap space]; nested: OutOfMemoryError[Java heap space];

Doeas anyone had simillar problems?

warkolm · February 20, 2017, 9:15pm

Do you have Monitoring installed to see what is happening?

szarik · March 1, 2017, 7:16am

Yes I had.

I found and resolve problem, I had threadpool set to 32 (number of threads in cores). instead of 16 (number of cores).

Whats more I rresigned from the use G1GC so params like -XX:-UseParNewGC -XX:-UseConcMarkSweepGC -XX:+UseG1GC were removed.

Also helped set the parameter -XX:CMSInitiatingOccupancyFraction=50
From here:

The Throughput Collector starts a GC cycle only when the heap is full, i.e., when there is not enough space available to store a newly allocated or promoted object. With the CMS Collector, it is not advisable to wait this long because it the application keeps on running (and allocating objects) during concurrent GC

Right now I do not have any problems with JVM, heap or cpu during indexation process

Nelutu_Armenean · March 15, 2017, 8:24am

Thank you. You saved me. I removed the -XX:Use.....GC settings and also set the -XX:CMSInitiatingOccupancyFraction=50 (it was 75 before) and now I don't get that out of memory problem anymore.

system · April 12, 2017, 8:24am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
High elastic search heap memory consumption while indexing huge files Elasticsearch	7	2007	September 20, 2017
Elasticsearch heap issues Elasticsearch	4	438	July 5, 2017
Garbage collection not kicking in - Heap is growing to 98% Elasticsearch	3	930	June 29, 2017
ElasticSearch high CPU load and Excessive garbage collection Elasticsearch	1	906	February 5, 2019
100% heap usage during indexing Elasticsearch	1	358	July 6, 2017

High heap during indexing documents

Related topics