Hi folks
After a few hours/days of uptime, our elasticsearch cluster is spending all
its time in GC. We're forced to restart nodes to bring response times back
to what they should be. We're using G1GC with a 25 GiB heap on Java 8.
In the GC logs, we just see lots of stop-the-world collections:
426011.398: [Full GC (Allocation Failure) 23G->22G(25G), 9.8222680 secs]
[Eden: 0.0B(1280.0M)->0.0B(1280.0M) Survivors: 0.0B->0.0B Heap:
23.2G(25.0G)->22.6G(25.0G)], [Metaspace: 42661K->42661K(1087488K)]
[Times: user=16.97 sys=0.01, real=9.82 secs]
426021.221: Total time for which application threads were stopped:
9.8237600 seconds
426021.221: [GC concurrent-mark-abort]
426022.226: Total time for which application threads were stopped:
0.0015720 seconds
426026.342: [GC pause (G1 Evacuation Pause) (young)
Desired survivor size 83886080 bytes, new threshold 15 (max 15)
(to-space exhausted), 0.2428630 secs]
[Parallel Time: 177.6 ms, GC Workers: 13]
[GC Worker Start (ms): Min: 426026344.4, Avg: 426026344.7, Max:
426026344.9, Diff: 0.5]
[Ext Root Scanning (ms): Min: 0.7, Avg: 0.9, Max: 1.0, Diff: 0.3,
Sum: 11.4]
[Update RS (ms): Min: 0.0, Avg: 3.1, Max: 5.5, Diff: 5.5, Sum: 40.1]
[Processed Buffers: Min: 0, Avg: 10.5, Max: 28, Diff: 28, Sum: 136]
[Scan RS (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 0.5]
[Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0,
Sum: 0.1]
[Object Copy (ms): Min: 170.5, Avg: 172.9, Max: 176.3, Diff: 5.7,
Sum: 2248.3]
[Termination (ms): Min: 0.0, Avg: 0.1, Max: 0.3, Diff: 0.3, Sum: 1.7]
[GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum:
0.4]
[GC Worker Total (ms): Min: 176.9, Avg: 177.1, Max: 177.4, Diff: 0.6,
Sum: 2302.3]
[GC Worker End (ms): Min: 426026521.8, Avg: 426026521.8, Max:
426026521.8, Diff: 0.0]
[Code Root Fixup: 0.2 ms]
[Code Root Migration: 0.0 ms]
[Code Root Purge: 0.0 ms]
[Clear CT: 0.2 ms]
[Other: 64.8 ms]
[Evacuation Failure: 60.9 ms]
[Choose CSet: 0.0 ms]
[Ref Proc: 0.3 ms]
[Ref Enq: 0.0 ms]
[Redirty Cards: 0.7 ms]
[Free CSet: 0.3 ms]
[Eden: 624.0M(1280.0M)->0.0B(1280.0M) Survivors: 0.0B->0.0B Heap:
23.2G(25.0G)->23.1G(25.0G)]
[Times: user=0.81 sys=0.02, real=0.25 secs]
I've tried lowering fielddata usage on the cluster, but the heap usage does
not change:
$ curl http://my-host:9200/_cluster/settings?pretty
{
"persistent" : { },
"transient" : {
"indices" : {
"fielddata" : {
"breaker" : {
"limit" : "40%",
"overhead" : "1.2"
}
}
}
}
}
I'm going to look at indices.fielddata.cache.size and
indices.fielddata.cache.expire, but I can't set these dynamically. Querying
the node stats, only around 12GiB seems to be from field data:
$ curl "http://my-host:9200/_nodes/stats?pretty"
...
"indices" : {
...
"fielddata" : {
"memory_size_in_bytes" : 12984041509,
"evictions" : 0,
"fields" : { }
},
},
...
"fielddata_breaker" : {
"maximum_size_in_bytes" : 10737418240,
"maximum_size" : "10gb",
"estimated_size_in_bytes" : 12984041509,
"estimated_size" : "12gb",
"overhead" : 1.2,
"tripped" : 0
Where should I look to see what elasticsearch is doing with all this heap
data?
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c5d525fc-62fa-4281-ab53-79b2abbaa9be%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.