Large heap usage with each node


(None) #1

Hi, running...

ES 1.5.2
Java 1.8_45
Windows 2008
4 nodes of: 32 cores 128GB 5 TB SSDs (each)

ES_HEAP_SIZE configured to 30g

I just finished bulk indexing 380,000,000 records but all 4 nodes are consuming 60% heap usage and not collecting. I have your kit running on 1 node and I tried forcing GC but nothing went down.
I know disable explicit gc is turned on in the .bat files but with your kit I'm still able to force collection. I was able to collect some memory before but not anymore.

Here is what is in the logs when I force GC from yourkit...

[ES xxx 01-01 (xxxx)] [gc][young][87406][39154] duration [1.6s], collections [1]/[2.5s], total [1.6s]/[17h], memory [18.1gb]->[17.1gb]/[30gb], all_pools {[young] [1gb]->[32mb]/[0b]}{[survivor] [112mb]->[128mb]/[0b]}{[old] [16.9gb]->[16.9gb]/[30gb]}
[ES xxx 01-01 (xxxx)] [gc][young][87489][39155] duration [1.6s], collections [1]/[1.8s], total [1.6s]/[17h], memory [18.4gb]->[17.1gb]/[30gb], all_pools {[young] [1.3gb]->[0b]/[0b]}{[survivor] [128mb]->[128mb]/[0b]}{[old] [16.9gb]->[16.9gb]/[30gb]}
[ES xxx 01-01 (xxxx)] [gc][old][87496][3] duration [43.2s], collections [1]/[44.2s], total [43.2s]/[1.3m], memory [17.2gb]->[14.9gb]/[30gb], all_pools {[young] [136mb]->[0b]/[0b]}{[survivor] [128mb]->[0b]/[0b]}{[old] [16.9gb]->[14.9gb]/[30gb]}
[ES xxx 01-01 (xxxx)] [gc][young][87576][39156] duration [1.6s], collections [1]/[2.7s], total [1.6s]/[17h], memory [16.3gb]->[15gb]/[30gb], all_pools {[young] [1.3gb]->[0b]/[0b]}{[survivor] [0b]->[80mb]/[0b]}{[old] [14.9gb]->[14.9gb]/[30gb]}
[ES xxx 01-01 (xxxx)] [gc][young][87680][39157] duration [1.5s], collections [1]/[2.1s], total [1.5s]/[17h], memory [16.3gb]->[14.9gb]/[30gb], all_pools {[young] [1.3gb]->[0b]/[0b]}{[survivor] [80mb]->[32mb]/[0b]}{[old] [14.9gb]->[14.9gb]/[30gb]}
[ES xxx 01-01 (xxxx)] [gc][young][87770][39158] duration [1.6s], collections [1]/[1.9s], total [1.6s]/[17h], memory [16.4gb]->[14.9gb]/[30gb], all_pools {[young] [1.4gb]->[0b]/[0b]}{[survivor] [32mb]->[24mb]/[0b]}{[old] [14.9gb]->[14.9gb]/[30gb]}
[ES xxx 01-01 (xxxx)] [gc][young][87861][39159] duration [1.7s], collections [1]/[2.7s], total [1.7s]/[17h], memory [16.4gb]->[14.9gb]/[30gb], all_pools {[young] [1.4gb]->[0b]/[0b]}{[survivor] [24mb]->[24mb]/[0b]}{[old] [14.9gb]->[14.9gb]/[30gb]}
[ES xxx 01-01 (xxxx)] [gc][young][87953][39160] duration [1.5s], collections [1]/[1.9s], total [1.5s]/[17h], memory [16.3gb]->[14.9gb]/[30gb], all_pools {[young] [1.3gb]->[0b]/[0b]}{[survivor] [24mb]->[24mb]/[0b]}{[old] [14.9gb]->[14.9gb]/[30gb]}
[ES xxx 01-01 (xxxx)] [gc][young][88043][39161] duration [1.6s], collections [1]/[1.9s], total [1.6s]/[17h], memory [16.4gb]->[14.9gb]/[30gb], all_pools {[young] [1.4gb]->[0b]/[0b]}{[survivor] [24mb]->[32mb]/[0b]}{[old] [14.9gb]->[14.9gb]/[30gb]}
[ES xxx 01-01 (xxxx)] [gc][old][88079][4] duration [37.9s], collections [1]/[38.1s], total [37.9s]/[2m], memory [15.5gb]->[14.9gb]/[30gb], all_pools {[young] [544mb]->[8mb]/[0b]}{[survivor] [32mb]->[0b]/[0b]}{[old] [14.9gb]->[14.9gb]/[30gb]}

As you can see I forced it twice and not much got collected...
Also using doc_values as much as possible. The field data and filter caches are minimal. About 5mb and 30mb per node respectively.

Is it the memory mapped files that are taking up the space? Right now the cluster is idle.
Does this mean I need to add more nodes to lower the memory consumption?


(Emptyemail) #2

What's the total memory available on the server?


(None) #3

As mentioned in the post. Each server is 128GB RAM, Each node is configured with ES_HEAP_SIZE=30g

Currently each node is taking about 15g of RAM of the 30g allocated, but not collecting any of it as shown above. Someone on SO suggested segments are taking up the heap space. but so far optimize doesn't seem to do anything.

Here the segment size for one node...

"segments": { "count": 18591, "memory_in_bytes": 6164548110, "index_writer_memory_in_bytes": 0, "index_writer_max_memory_in_bytes": 4691267670, "version_map_memory_in_bytes": 0, "fixed_bit_set_memory_in_bytes": 0 } That's for 1 node it's about the same for the rest.


(Emptyemail) #4

Sorry about that I saw the cores and the SSDs but missed the ram

I am not an ES expert, but all of my clusters typically take up 50% of ram regardless of the number of nodes(within reason), with regular usage that number grows until it hits 80% after which GC kicks in and it drops back down to 50% or so.

With the default settings 30% of your ram gets allocated to Field/Filter caches, which would not get GCd without ES releasing them from the cache.

GC will only release the memory that is not being referenced by ES, As an example the cache will remain in memory because ES sees no reason to clear it. As you query the cluster some thing will get removed from the cache and new things will get added, the stuff thats removed is going to get garbage collected.

I am not sure I see 50% of ram being used as a problem unless you have other symptoms. Also the more shards you have to more memory you are likely to use.

Have you looked at node diagnostics in the HQ plugin?


(None) #5

Yep! I'm not using field data. I moved almost all my data to doc values.

Field data is at 5mb per node.
Filter cache at 30mb per node.

Write now I have 32 shards starting with 4 physical nodes and we expect to grow in the next couple of years.

I have a couple of option here...
1- Look into optimizing the indexes to reduce segment size.
2- Since I have fairly big boxes add a second node per machine.

Write now adding second node, just waiting for rebalance to finish...


(None) #6

I also want to note that my plan is to go to 2.5 billion documents. I just stopped at 380. Cause I saw I was running out of ram and one point I did. So i had to rebuild the index.

So now I'm stopped at 380 million docs and checking my option before i proceed.


(Mark Walkom) #7

How many indices belong to those 32 shards?


(None) #8

It's actually 48 indices with 32 shards each. So there 3142 shards including the replicas... I'm doing index per day strategy.

We expect to grow 150% next year so I would expect the cluster to grow to a dozen pysical nodes maybe more.

So 150% of 2.5-3 billion documemts. So I'm now I'm trying to test with 2.5 billion, but expect that to grow next year by 150%.

At 800 million docs I lost the cluster because there was way to many pauses and no collections hapening so it kept trying to recover over and over. So I restarted re-indexing and reached 380 million. So now checking options...


Why does heap usage keep approaching 100%?
(Mark Walkom) #9

That's the problem then. Each shard is a lucene instance, it requires resources to maintain.

Reduce that to a reasonable number and you should see better resources usage.


(None) #10

So maybe then 8 shards per index. Hummm I just want to avoid having to re-index everything next year. When we add more boxes.

Though if what you are saying is correct... Given I have good boxes then I should also be able to spin up 2 nodes per physical machine which should spread the shards and give more heap space back to each node as well... And decide a good shard size also.


(Christian Dahlqvist) #11

Depending on the amount of data you are indexing each day, you may even go down to 4 shards per index. You generally want the shards of your daily indices to be reasonably large, and having a shard size of tens of GB is not uncommon.

There is also nothing that dictates that all daily indices must have the same number of shards, not even if you are using routing. You can therefore increase the number of shards for the next daily index to be created as volumes grow, and you do not need to reindex any data held in older indices.


(None) #12

Thanks all. I did some testing and with my current daily size even single shard is ok, But going with 4 to get some parallelism with the 4 nodes. The mem usage is way down now.


(None) #13

So now i'm at at 352 shards over 44 indexes and I still see 12GB of heap usage per node. I'm thinking the only way I can really reach passed 1 billion docs on the 4 nodes is to add extra node per physical machine since I have the horse power., But keep 4 shards + 1 replica config.

Btw i did some tests with my daily averages and I can even use single shard per index. But would that affect parallelism? I can also maybe provide yourkit snaphshot of what is eating up the ram...


What's eating node's memory?
(Michael Ravits) #14

Hi javadevmtl,

I am experiencing a similar problem.
Did you ever resolve your problem?
Could you share the solution?

Thanks,
Michael


(Mark Walkom) #15

Please start your own thread :slight_smile:


(system) #16