Outofmemory exception on ES 1.7.0

kkomv · August 5, 2015, 4:45pm

Hi, We are seeing an out of memory exception on the Elasticsearech JVM , as it is running out of heap space after a while.

We are using elastic search version 1.7.0, with 1.7.0_79 version of java, when we looked at the heap dump we could see that around 80% of the retained heap size was with around 1.7 million clusterState objects in the heap. Any pointers on why the heap is being retained by this objects and not being released? How often are the clusterState objects created?

We are running a 4 node ES cluster and each ES JVM is being run with a heap space of 8g

ES_HEAP_SIZE=8g

Any help is much appreciated. Thanks.

Class Name                                                                                                                    | Shallow Heap | Retained Heap | Percentage
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor @ 0x750acc598                                        |           88 | 3,281,330,168 |     81.40%
|- java.util.concurrent.PriorityBlockingQueue @ 0x750b96760                                                                   |           40 | 3,281,329,448 |     81.40%
|  |- java.lang.Object[1707954] @ 0x7f670bc00                                                                                 |    6,831,832 | 3,281,329,304 |     81.40%
|  |  |- org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable @ 0x74fc721b8|           40 |       532,728 |      0.01%
|  |  |- org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable @ 0x74fc6fc38|           40 |       529,784 |      0.01%
|  |  |- org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable @ 0x74fc70eb0|           40 |       521,008 |      0.01%
|  |  |- org.elasticsearch.cluster.ClusterState @ 0x708454380                                                                 |           56 |       400,104 |      0.01%
|  |  |- org.elasticsearch.cluster.ClusterState @ 0x7e6174ff8                                                                 |           56 |       400,104 |      0.01%
|  |  |- org.elasticsearch.cluster.ClusterState @ 0x7bd933908                                                                 |           56 |       400,104 |      0.01%
|  |  |- org.elasticsearch.cluster.ClusterState @ 0x7cacb4780                                                                 |           56 |       400,104 |      0.01%
|  |  |- org.elasticsearch.cluster.ClusterState @ 0x73c460d50                                                                 |           56 |       400,104 |      0.01%
|  |  |- org.elasticsearch.cluster.ClusterState @ 0x7292082e8                                                                 |           56 |       400,104 |      0.01%
|  |  |- org.elasticsearch.cluster.ClusterState @ 0x73b610838                                                                 |           56 |       400,104 |      0.01%
|  |  |- org.elasticsearch.cluster.ClusterState @ 0x79fadbb98                                                                 |           56 |       400,104 |      0.01%
|  |  |- org.elasticsearch.cluster.ClusterState @ 0x7200bf040                                                                 |           56 |       400,104 |      0.01%
|  |  |- org.elasticsearch.cluster.ClusterState @ 0x7306d3430                                                                 |           56 |       400,104 |      0.01%
|  |  |- org.elasticsearch.cluster.ClusterState @ 0x7146e6fb0                                                                 |           56 |       400,104 |      0.01%
|  |  |- org.elasticsearch.cluster.ClusterState @ 0x72b5c88d0                                                                 |           56 |       400,104 |      0.01%    
|  |  |- org.elasticsearch.cluster.ClusterState @ 0x724eac8f8                                                                 |           56 |       400,104 |      0.01%
|  |  |- org.elasticsearch.cluster.ClusterState @ 0x71d312540                                                                 |           56 |       400,104 |      0.01%
|  |  '- Total: 25 of 1,435,266 entries; 1,435,241 more                                                                       |              |               |           
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------

warkolm · August 6, 2015, 2:34am

You don't happen to have a lot of mappings do you?

kkomv · August 6, 2015, 3:48am

No, just 2 , the default and a custom mapping. I do however see a update_mapping [logs] (dynamic) Info log whenever a new index is created, index creation is being done on an hourly basis.

warkolm · August 6, 2015, 4:01am

That's a lot of indices, why so many?

mosiddi · August 6, 2015, 4:21am

If you have a lot of index, your cluster state will be huge and will consume space in memory, actually a lot of space. Remember in ES, a lot cluster state sync happens across nodes and all nodes maintain the state, keeping it light is important. If you have marvel enabled, the large # of indices will impact that as well. Marvel client will keep sending index stats for all those indices and consume a lot of memory.

kkomv · August 6, 2015, 3:12pm

We are indexing on an hourly basis because of the volume of the logs, our hourly index sizes frequently go beyond 5gb. We have been running with this setup on ES 1.2 successfully, but now seeing an issue on upgrading to 1.7.

warkolm · August 6, 2015, 11:05pm

5GB for an index is not big, I'd just stick with a daily one with (at least) 4 shards.

To add to that, if you have the default of 5 shards and 1 replica, you have 240 shards per day, which is a lot. And a shard is a Lucene instance that requires resources to be maintained. Over sharding, which is what you are doing, is going to be playing a part in this OOM.

kkomv · August 6, 2015, 11:57pm

Thanks for the response @warkolm. I will try out by indexing on a daily basis, also, when I said 5gb as an index size, that's the size for an hour, so if we index on a daily basis, in the worst case the index size could go upto 120GB, would that cause any other problem?

warkolm · August 7, 2015, 12:13am

Nope, that's fine.

mosiddi · August 11, 2015, 3:20pm

Seems reasonable to me as well

Topic		Replies	Views
ElasticSearch OutOfMemory Exceptions Elasticsearch	8	364	July 6, 2017
Heap Space, JAVA API Elasticsearch	1	371	July 6, 2017
Elasticsearch (6.4.1) - JVM OutOfMemoryError Elasticsearch	5	1008	June 26, 2019
OOM Java heapspace on ES1.1.1 cluster Elasticsearch	2	438	April 13, 2017
java.lang.OutOfMemoryError: Java heap space - Bulk Indexing Elasticsearch	8	3862	June 29, 2017

Outofmemory exception on ES 1.7.0

Related topics