0.90.11 stuck with high memory usage during bulk indexing and even hours after stopping


(Ankush Jhalani) #1

We have a single node, 12GB, 16 core ES instance to which we are 12 threads
bulk indexing into a 12shard index. Each thread sends a request of size kb
to couple megabytes. The thread bulk queue_size is increased from default
50 to 100.

With v0.90.11, we are noticing that the jvm memory usage keeps growing
slowly and doesn't go down, gc runs frequently but doesn't free up much
memory. From debug logs, it seems the segment merges are happening. However
even after we stop indexing, for many hours the instance is busy doing
segment merges. Sample gist from hot threads I ran couple minutes apart -
(https://gist.github.com/ajhalani/8976792). Even after 16 hours and little
use on the machine, the jvm memory usage is about 80% (CMS should run at
75%) and nodes stats show is running very frequently.

If we don't stop indexing, eventually after 60-70GB indexing the instance
goes out of memory. This seems like a memory leak, we didn't face this
issue with 0.90.7 (though we were probably using a 6 thread process for
bulk indexing).

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a1819d5f-caa3-4ac4-886f-5b560eada87a%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Ankush Jhalani) #2

Following is histogram of java object heap, dumped using jmap -histo
num #instances #bytes class name


1: 1888557 4767284496 [B

2: 198190 1892640192 [S

3: 256130 1546754512 [I

4: 119853 1198288048 [J

5: 230881 121272272 [Lorg.apache.lucene.util.fst.FST$Arc;

6: 1391854 93015760 [C

7: 842721 60675912 org.apache.lucene.util.fst.FST$Arc

8: 1122220 35911040 java.util.HashMap$Entry

9: 1329252 31902048 java.lang.String

10: 690285 27611400 java.util.TreeMap$Entry

11: 1133480 27203520 org.apache.lucene.util.BytesRef

12: 283533 26415040 [Ljava.util.HashMap$Entry;

13: 229605 23878920 org.apache.lucene.util.fst.FST

14: 259069 18856896 [Ljava.lang.Object;

15: 229603 16531416
org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader

16: 266852 14943712 java.util.HashMap

17: 230422 11060256 org.apache.lucene.index.FieldInfo

On Thursday, February 13, 2014 10:19:28 AM UTC-5, Ankush Jhalani wrote:

We have a single node, 12GB, 16 core ES instance to which we are 12
threads bulk indexing into a 12shard index. Each thread sends a request of
size kb to couple megabytes. The thread bulk queue_size is increased from
default 50 to 100.

With v0.90.11, we are noticing that the jvm memory usage keeps growing
slowly and doesn't go down, gc runs frequently but doesn't free up much
memory. From debug logs, it seems the segment merges are happening. However
even after we stop indexing, for many hours the instance is busy doing
segment merges. Sample gist from hot threads I ran couple minutes apart - (
https://gist.github.com/ajhalani/8976792). Even after 16 hours and little
use on the machine, the jvm memory usage is about 80% (CMS should run at
75%) and nodes stats show is running very frequently.

If we don't stop indexing, eventually after 60-70GB indexing the instance
goes out of memory. This seems like a memory leak, we didn't face this
issue with 0.90.7 (though we were probably using a 6 thread process for
bulk indexing).

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c60091b9-52e2-4b3e-9456-25985e80c1d5%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Ankush Jhalani) #3

If it helps, following is jmap memory summary and object heap histogram
(https://gist.github.com/ajhalani/8977548)

On Thursday, February 13, 2014 10:19:28 AM UTC-5, Ankush Jhalani wrote:

We have a single node, 12GB, 16 core ES instance to which we are 12
threads bulk indexing into a 12shard index. Each thread sends a request of
size kb to couple megabytes. The thread bulk queue_size is increased from
default 50 to 100.

With v0.90.11, we are noticing that the jvm memory usage keeps growing
slowly and doesn't go down, gc runs frequently but doesn't free up much
memory. From debug logs, it seems the segment merges are happening. However
even after we stop indexing, for many hours the instance is busy doing
segment merges. Sample gist from hot threads I ran couple minutes apart - (
https://gist.github.com/ajhalani/8976792). Even after 16 hours and little
use on the machine, the jvm memory usage is about 80% (CMS should run at
75%) and nodes stats show is running very frequently.

If we don't stop indexing, eventually after 60-70GB indexing the instance
goes out of memory. This seems like a memory leak, we didn't face this
issue with 0.90.7 (though we were probably using a 6 thread process for
bulk indexing).

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3d8c3388-9932-4fbd-a430-e9ee73e613bb%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Jörg Prante) #4

There is no leak.

It is expected that ES uses all available memory.

Do you see any GC messages, or errors in the log?

Are you executing queries that use Lucene FST? Did you try to reduce memory
by disabling bloom filter loading? (index.codec.bloom.load=false)

Besides, I suggest updating the Java JVM, it is an older version.

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoEGkiuMHFGFBxeZ70KnFRdVWe%2BjeduUfYYpwbVRAu%2BHMA%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Adrien Grand) #5

If you are doing indexing intensively, maybe merge throttling[1] is set to
a too low value and background merging cannot keep up with the new segments
that are created? Can you check how many segments you have in your index on
the moment when you stop indexing?

[1]
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules-store.html#store-throttling

On Thu, Feb 13, 2014 at 6:13 PM, Ankush Jhalani ankush.jhalani@gmail.comwrote:

If it helps, following is jmap memory summary and object heap histogram (
https://gist.github.com/ajhalani/8977548)

On Thursday, February 13, 2014 10:19:28 AM UTC-5, Ankush Jhalani wrote:

We have a single node, 12GB, 16 core ES instance to which we are 12
threads bulk indexing into a 12shard index. Each thread sends a request of
size kb to couple megabytes. The thread bulk queue_size is increased from
default 50 to 100.

With v0.90.11, we are noticing that the jvm memory usage keeps growing
slowly and doesn't go down, gc runs frequently but doesn't free up much
memory. From debug logs, it seems the segment merges are happening. However
even after we stop indexing, for many hours the instance is busy doing
segment merges. Sample gist from hot threads I ran couple minutes apart - (
https://gist.github.com/ajhalani/8976792). Even after 16 hours and
little use on the machine, the jvm memory usage is about 80% (CMS should
run at 75%) and nodes stats show is running very frequently.

If we don't stop indexing, eventually after 60-70GB indexing the instance
goes out of memory. This seems like a memory leak, we didn't face this
issue with 0.90.7 (though we were probably using a 6 thread process for
bulk indexing).

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/3d8c3388-9932-4fbd-a430-e9ee73e613bb%40googlegroups.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j54Nk%2BvT4xGnRy6yFFKzne2vyyazwcJznN%2B9w-yXKkfyA%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Ankush Jhalani) #6

Thanks both for your input.

@Jörg:
I understand ES uses all available process memory. I meant jvm memory
usage, which it tries to reclaims when it exceeds 75% (due
to -XX:CMSInitiatingOccupancyFraction=75) option.
I don't know what kind of queries use Lucene FST, could you be kind enough
to explain. I also didn't know about bloom filter and it's
memory usage, is their a way to check how much memory usage it's adding.

I will update JVM, but the issue is the same bulk indexing was not making
node out of memory in v0.90.7, it's doing it with v0.90.11

@*Adrien *:
I will play with merge throttling to speed it up. After many hours, even
after merge operations are finished, the memory still wasn't
reclaimed so I am more worried about that.

fyi, from ES logs -
[2014-02-14 10:09:54,109][WARN ][monitor.jvm ] [machine1.node2]
[gc][old][75611][2970] duration [43s], collections [1]/[44.1s], total [43s
]/[55.5m], memory [11.3gb]->[10.6gb]/[11.8gb], all_pools {[young] [454.6mb
]->[10.4mb]/[865.3mb]}{[survivor] [108.1mb]->[0b]/[108.1mb]}{[old] [10.8gb
]->[10.6gb]/[10.9gb]}

And from /_cluster/stats request -
"fielddata" : {
"memory_size" : "3.6gb",
"memory_size_in_bytes" : 3881191105,
"evictions" : 0
},
"filter_cache" : {
"memory_size" : "622.4mb",
"memory_size_in_bytes" : 652677071,
"evictions" : 0
},
"id_cache" : {
"memory_size" : "2gb",
"memory_size_in_bytes" : 2170019078
},
"completion" : {
"size" : "0b",
"size_in_bytes" : 0
},
"segments" : {
"count" : 789,
"memory" : "3.4gb",
"memory_in_bytes" : 3730255779
}

If node is running out of memory, shouldn't ES be reclaiming id_cache or
fielddata ?

On Thursday, February 13, 2014 10:19:28 AM UTC-5, Ankush Jhalani wrote:

We have a single node, 12GB, 16 core ES instance to which we are 12
threads bulk indexing into a 12shard index. Each thread sends a request of
size kb to couple megabytes. The thread bulk queue_size is increased from
default 50 to 100.

With v0.90.11, we are noticing that the jvm memory usage keeps growing
slowly and doesn't go down, gc runs frequently but doesn't free up much
memory. From debug logs, it seems the segment merges are happening. However
even after we stop indexing, for many hours the instance is busy doing
segment merges. Sample gist from hot threads I ran couple minutes apart - (
https://gist.github.com/ajhalani/8976792). Even after 16 hours and little
use on the machine, the jvm memory usage is about 80% (CMS should run at
75%) and nodes stats show is running very frequently.

If we don't stop indexing, eventually after 60-70GB indexing the instance
goes out of memory. This seems like a memory leak, we didn't face this
issue with 0.90.7 (though we were probably using a 6 thread process for
bulk indexing).

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/99b4d682-5d0d-4255-bf5f-ce0561b111be%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Ankush Jhalani) #7

I ran '_cache/clear' which cleaned up fielddata, id_cache and jvm memory
usage dropped ~10.5GB -> ~5 GB..

Shouldn't ES itself clear up these cache when jvm memory usage becomes
really high? I see the gc count kept increasing but not a lot of memory was
reclaimed until I ran _cache/clear..

On Thursday, February 13, 2014 10:19:28 AM UTC-5, Ankush Jhalani wrote:

We have a single node, 12GB, 16 core ES instance to which we are 12
threads bulk indexing into a 12shard index. Each thread sends a request of
size kb to couple megabytes. The thread bulk queue_size is increased from
default 50 to 100.

With v0.90.11, we are noticing that the jvm memory usage keeps growing
slowly and doesn't go down, gc runs frequently but doesn't free up much
memory. From debug logs, it seems the segment merges are happening. However
even after we stop indexing, for many hours the instance is busy doing
segment merges. Sample gist from hot threads I ran couple minutes apart - (
https://gist.github.com/ajhalani/8976792). Even after 16 hours and little
use on the machine, the jvm memory usage is about 80% (CMS should run at
75%) and nodes stats show is running very frequently.

If we don't stop indexing, eventually after 60-70GB indexing the instance
goes out of memory. This seems like a memory leak, we didn't face this
issue with 0.90.7 (though we were probably using a 6 thread process for
bulk indexing).

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f53af0c2-3d30-4059-a044-54213f1a32f3%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Binh Ly) #8

Don't know if this might help, but you can limit the max size of your
fielddata cache as well as the expiry of the items in that cache:

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules-fielddata.html

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/06cff03b-dd4e-4f5b-84c5-a5112cbe8776%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Ankush Jhalani) #9

It seems I was suspecting wrong process of causing memory issue, it doesn't
seem to be indexing since issue happened even after we stopped it.
I found out from '_cluster/stats' and '_index/stats' api that one of the
existing index which is taking most memory -
"filter_cache" : {
"memory_size" : "252.2mb",
"memory_size_in_bytes" : 264546840,
"evictions" : 0
},
"id_cache" : {
"memory_size" : "215.4mb",
"memory_size_in_bytes" : 225963916
},
"fielddata" : {
"memory_size" : "3.2gb",
"memory_size_in_bytes" : 3479467264,
"evictions" : 0
},
"completion" : {
"size" : "0b",
"size_in_bytes" : 0
},
"segments" : {
"count" : 333,
"memory" : "5.1gb",
"memory_in_bytes" : 5561471705
}

I think to avoid confusion, I will open a separate thread to ask about it.

On Friday, February 14, 2014 11:29:55 AM UTC-5, Binh Ly wrote:

Don't know if this might help, but you can limit the max size of your
fielddata cache as well as the expiry of the items in that cache:

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules-fielddata.html

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a9ca91f4-5bca-4dec-89aa-6f9a9cfe80dc%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #10