Elasticsearch fills the heap then spends all its time doing garbage collection

Hi folks

After a few hours/days of uptime, our elasticsearch cluster is spending all
its time in GC. We're forced to restart nodes to bring response times back
to what they should be. We're using G1GC with a 25 GiB heap on Java 8.

In the GC logs, we just see lots of stop-the-world collections:

426011.398: [Full GC (Allocation Failure) 23G->22G(25G), 9.8222680 secs]
[Eden: 0.0B(1280.0M)->0.0B(1280.0M) Survivors: 0.0B->0.0B Heap:
23.2G(25.0G)->22.6G(25.0G)], [Metaspace: 42661K->42661K(1087488K)]
[Times: user=16.97 sys=0.01, real=9.82 secs]
426021.221: Total time for which application threads were stopped:
9.8237600 seconds
426021.221: [GC concurrent-mark-abort]
426022.226: Total time for which application threads were stopped:
0.0015720 seconds
426026.342: [GC pause (G1 Evacuation Pause) (young)
Desired survivor size 83886080 bytes, new threshold 15 (max 15)
(to-space exhausted), 0.2428630 secs]
[Parallel Time: 177.6 ms, GC Workers: 13]
[GC Worker Start (ms): Min: 426026344.4, Avg: 426026344.7, Max:
426026344.9, Diff: 0.5]
[Ext Root Scanning (ms): Min: 0.7, Avg: 0.9, Max: 1.0, Diff: 0.3,
Sum: 11.4]
[Update RS (ms): Min: 0.0, Avg: 3.1, Max: 5.5, Diff: 5.5, Sum: 40.1]
[Processed Buffers: Min: 0, Avg: 10.5, Max: 28, Diff: 28, Sum: 136]
[Scan RS (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 0.5]
[Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0,
Sum: 0.1]
[Object Copy (ms): Min: 170.5, Avg: 172.9, Max: 176.3, Diff: 5.7,
Sum: 2248.3]
[Termination (ms): Min: 0.0, Avg: 0.1, Max: 0.3, Diff: 0.3, Sum: 1.7]
[GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum:
0.4]
[GC Worker Total (ms): Min: 176.9, Avg: 177.1, Max: 177.4, Diff: 0.6,
Sum: 2302.3]
[GC Worker End (ms): Min: 426026521.8, Avg: 426026521.8, Max:
426026521.8, Diff: 0.0]

[Code Root Fixup: 0.2 ms]

[Code Root Migration: 0.0 ms]

[Code Root Purge: 0.0 ms]
[Clear CT: 0.2 ms]
[Other: 64.8 ms]
[Evacuation Failure: 60.9 ms]
[Choose CSet: 0.0 ms]
[Ref Proc: 0.3 ms]
[Ref Enq: 0.0 ms]
[Redirty Cards: 0.7 ms]
[Free CSet: 0.3 ms]
[Eden: 624.0M(1280.0M)->0.0B(1280.0M) Survivors: 0.0B->0.0B Heap:
23.2G(25.0G)->23.1G(25.0G)]
[Times: user=0.81 sys=0.02, real=0.25 secs]

I've tried lowering fielddata usage on the cluster, but the heap usage does
not change:

$ curl http://my-host:9200/_cluster/settings?pretty
{
"persistent" : { },
"transient" : {
"indices" : {
"fielddata" : {
"breaker" : {
"limit" : "40%",
"overhead" : "1.2"
}
}
}
}
}

I'm going to look at indices.fielddata.cache.size and
indices.fielddata.cache.expire, but I can't set these dynamically. Querying
the node stats, only around 12GiB seems to be from field data:

$ curl "http://my-host:9200/_nodes/stats?pretty"
...
"indices" : {
...
"fielddata" : {
"memory_size_in_bytes" : 12984041509,
"evictions" : 0,
"fields" : { }
},
},
...
"fielddata_breaker" : {
"maximum_size_in_bytes" : 10737418240,
"maximum_size" : "10gb",
"estimated_size_in_bytes" : 12984041509,
"estimated_size" : "12gb",
"overhead" : 1.2,
"tripped" : 0

Where should I look to see what elasticsearch is doing with all this heap
data?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c5d525fc-62fa-4281-ab53-79b2abbaa9be%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

How many nodes, how much data and in how many indexes? What ES version?

On 17 December 2014 at 11:47, Wilfred Hughes yowilfred@gmail.com wrote:

Hi folks

After a few hours/days of uptime, our elasticsearch cluster is spending
all its time in GC. We're forced to restart nodes to bring response times
back to what they should be. We're using G1GC with a 25 GiB heap on Java 8.

In the GC logs, we just see lots of stop-the-world collections:

426011.398: [Full GC (Allocation Failure) 23G->22G(25G), 9.8222680 secs]
[Eden: 0.0B(1280.0M)->0.0B(1280.0M) Survivors: 0.0B->0.0B Heap:
23.2G(25.0G)->22.6G(25.0G)], [Metaspace: 42661K->42661K(1087488K)]
[Times: user=16.97 sys=0.01, real=9.82 secs]
426021.221: Total time for which application threads were stopped:
9.8237600 seconds
426021.221: [GC concurrent-mark-abort]
426022.226: Total time for which application threads were stopped:
0.0015720 seconds
426026.342: [GC pause (G1 Evacuation Pause) (young)
Desired survivor size 83886080 bytes, new threshold 15 (max 15)
(to-space exhausted), 0.2428630 secs]
[Parallel Time: 177.6 ms, GC Workers: 13]
[GC Worker Start (ms): Min: 426026344.4, Avg: 426026344.7, Max:
426026344.9, Diff: 0.5]
[Ext Root Scanning (ms): Min: 0.7, Avg: 0.9, Max: 1.0, Diff: 0.3,
Sum: 11.4]
[Update RS (ms): Min: 0.0, Avg: 3.1, Max: 5.5, Diff: 5.5, Sum: 40.1]
[Processed Buffers: Min: 0, Avg: 10.5, Max: 28, Diff: 28, Sum:
136]
[Scan RS (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 0.5]
[Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0,
Sum: 0.1]
[Object Copy (ms): Min: 170.5, Avg: 172.9, Max: 176.3, Diff: 5.7,
Sum: 2248.3]
[Termination (ms): Min: 0.0, Avg: 0.1, Max: 0.3, Diff: 0.3, Sum: 1.7]
[GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum:
0.4]
[GC Worker Total (ms): Min: 176.9, Avg: 177.1, Max: 177.4, Diff:
0.6, Sum: 2302.3]
[GC Worker End (ms): Min: 426026521.8, Avg: 426026521.8, Max:
426026521.8, Diff: 0.0]

[Code Root Fixup: 0.2 ms]

[Code Root Migration: 0.0 ms]

[Code Root Purge: 0.0 ms]
[Clear CT: 0.2 ms]
[Other: 64.8 ms]
[Evacuation Failure: 60.9 ms]
[Choose CSet: 0.0 ms]
[Ref Proc: 0.3 ms]
[Ref Enq: 0.0 ms]
[Redirty Cards: 0.7 ms]
[Free CSet: 0.3 ms]
[Eden: 624.0M(1280.0M)->0.0B(1280.0M) Survivors: 0.0B->0.0B Heap:
23.2G(25.0G)->23.1G(25.0G)]
[Times: user=0.81 sys=0.02, real=0.25 secs]

I've tried lowering fielddata usage on the cluster, but the heap usage
does not change:

$ curl http://my-host:9200/_cluster/settings?pretty
{
"persistent" : { },
"transient" : {
"indices" : {
"fielddata" : {
"breaker" : {
"limit" : "40%",
"overhead" : "1.2"
}
}
}
}
}

I'm going to look at indices.fielddata.cache.size and
indices.fielddata.cache.expire, but I can't set these dynamically. Querying
the node stats, only around 12GiB seems to be from field data:

$ curl "http://my-host:9200/_nodes/stats?pretty"
...
"indices" : {
...
"fielddata" : {
"memory_size_in_bytes" : 12984041509,
"evictions" : 0,
"fields" : { }
},
},
...
"fielddata_breaker" : {
"maximum_size_in_bytes" : 10737418240,
"maximum_size" : "10gb",
"estimated_size_in_bytes" : 12984041509,
"estimated_size" : "12gb",
"overhead" : 1.2,
"tripped" : 0

Where should I look to see what elasticsearch is doing with all this heap
data?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/c5d525fc-62fa-4281-ab53-79b2abbaa9be%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/c5d525fc-62fa-4281-ab53-79b2abbaa9be%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEYi1X9FZUk9ofLxjaZ6fB1FwYKdp-baroEZp%2BoYAdCfh_vKAw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

We're running three nodes (two data and one dataless) and using ES 1.2.4,
for storing logstash data. 500 GiB data total, 49 indexes, 5 shards per
index.

On Wednesday, 17 December 2014 11:39:29 UTC, Mark Walkom wrote:

How many nodes, how much data and in how many indexes? What ES version?

On 17 December 2014 at 11:47, Wilfred Hughes <yowi...@gmail.com
<javascript:>> wrote:

Hi folks

After a few hours/days of uptime, our elasticsearch cluster is spending
all its time in GC. We're forced to restart nodes to bring response times
back to what they should be. We're using G1GC with a 25 GiB heap on Java 8.

In the GC logs, we just see lots of stop-the-world collections:

426011.398: [Full GC (Allocation Failure) 23G->22G(25G), 9.8222680 secs]
[Eden: 0.0B(1280.0M)->0.0B(1280.0M) Survivors: 0.0B->0.0B Heap:
23.2G(25.0G)->22.6G(25.0G)], [Metaspace: 42661K->42661K(1087488K)]
[Times: user=16.97 sys=0.01, real=9.82 secs]
426021.221: Total time for which application threads were stopped:
9.8237600 seconds
426021.221: [GC concurrent-mark-abort]
426022.226: Total time for which application threads were stopped:
0.0015720 seconds
426026.342: [GC pause (G1 Evacuation Pause) (young)
Desired survivor size 83886080 bytes, new threshold 15 (max 15)
(to-space exhausted), 0.2428630 secs]
[Parallel Time: 177.6 ms, GC Workers: 13]
[GC Worker Start (ms): Min: 426026344.4, Avg: 426026344.7, Max:
426026344.9, Diff: 0.5]
[Ext Root Scanning (ms): Min: 0.7, Avg: 0.9, Max: 1.0, Diff: 0.3,
Sum: 11.4]
[Update RS (ms): Min: 0.0, Avg: 3.1, Max: 5.5, Diff: 5.5, Sum: 40.1]
[Processed Buffers: Min: 0, Avg: 10.5, Max: 28, Diff: 28, Sum:
136]
[Scan RS (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 0.5]
[Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0,
Sum: 0.1]
[Object Copy (ms): Min: 170.5, Avg: 172.9, Max: 176.3, Diff: 5.7,
Sum: 2248.3]
[Termination (ms): Min: 0.0, Avg: 0.1, Max: 0.3, Diff: 0.3, Sum:
1.7]
[GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1,
Sum: 0.4]
[GC Worker Total (ms): Min: 176.9, Avg: 177.1, Max: 177.4, Diff:
0.6, Sum: 2302.3]
[GC Worker End (ms): Min: 426026521.8, Avg: 426026521.8, Max:
426026521.8, Diff: 0.0]

[Code Root Fixup: 0.2 ms]

[Code Root Migration: 0.0 ms]

[Code Root Purge: 0.0 ms]
[Clear CT: 0.2 ms]
[Other: 64.8 ms]
[Evacuation Failure: 60.9 ms]
[Choose CSet: 0.0 ms]
[Ref Proc: 0.3 ms]
[Ref Enq: 0.0 ms]
[Redirty Cards: 0.7 ms]
[Free CSet: 0.3 ms]
[Eden: 624.0M(1280.0M)->0.0B(1280.0M) Survivors: 0.0B->0.0B Heap:
23.2G(25.0G)->23.1G(25.0G)]
[Times: user=0.81 sys=0.02, real=0.25 secs]

I've tried lowering fielddata usage on the cluster, but the heap usage
does not change:

$ curl http://my-host:9200/_cluster/settings?pretty
{
"persistent" : { },
"transient" : {
"indices" : {
"fielddata" : {
"breaker" : {
"limit" : "40%",
"overhead" : "1.2"
}
}
}
}
}

I'm going to look at indices.fielddata.cache.size and
indices.fielddata.cache.expire, but I can't set these dynamically. Querying
the node stats, only around 12GiB seems to be from field data:

$ curl "http://my-host:9200/_nodes/stats?pretty"
...
"indices" : {
...
"fielddata" : {
"memory_size_in_bytes" : 12984041509,
"evictions" : 0,
"fields" : { }
},
},
...
"fielddata_breaker" : {
"maximum_size_in_bytes" : 10737418240,
"maximum_size" : "10gb",
"estimated_size_in_bytes" : 12984041509,
"estimated_size" : "12gb",
"overhead" : 1.2,
"tripped" : 0

Where should I look to see what elasticsearch is doing with all this heap
data?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/c5d525fc-62fa-4281-ab53-79b2abbaa9be%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/c5d525fc-62fa-4281-ab53-79b2abbaa9be%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/09f80323-f347-4d1d-a792-8ab2eca49132%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Then you're quite possibly at the limits for your heap/nodes.

You can try adding more nodes (recommended), increasing your heap to a max
of 31GB or removing or closing old indexes. If you are using time based
indexes, you can also try disabling bloom filter to get a little bit of
memory back from older indexes, but it won't be much.
It should also be noted that having a shard comes at a cost, and so having
5 shards on two nodes may be a bit of overkill.

What sort of queries are you running?

On 17 December 2014 at 15:03, Wilfred Hughes yowilfred@gmail.com wrote:

We're running three nodes (two data and one dataless) and using ES 1.2.4,
for storing logstash data. 500 GiB data total, 49 indexes, 5 shards per
index.

On Wednesday, 17 December 2014 11:39:29 UTC, Mark Walkom wrote:

How many nodes, how much data and in how many indexes? What ES version?

On 17 December 2014 at 11:47, Wilfred Hughes yowi...@gmail.com wrote:

Hi folks

After a few hours/days of uptime, our elasticsearch cluster is spending
all its time in GC. We're forced to restart nodes to bring response times
back to what they should be. We're using G1GC with a 25 GiB heap on Java 8.

In the GC logs, we just see lots of stop-the-world collections:

426011.398: [Full GC (Allocation Failure) 23G->22G(25G), 9.8222680 secs]
[Eden: 0.0B(1280.0M)->0.0B(1280.0M) Survivors: 0.0B->0.0B Heap:
23.2G(25.0G)->22.6G(25.0G)], [Metaspace: 42661K->42661K(1087488K)]
[Times: user=16.97 sys=0.01, real=9.82 secs]
426021.221: Total time for which application threads were stopped:
9.8237600 seconds
426021.221: [GC concurrent-mark-abort]
426022.226: Total time for which application threads were stopped:
0.0015720 seconds
426026.342: [GC pause (G1 Evacuation Pause) (young)
Desired survivor size 83886080 bytes, new threshold 15 (max 15)
(to-space exhausted), 0.2428630 secs]
[Parallel Time: 177.6 ms, GC Workers: 13]
[GC Worker Start (ms): Min: 426026344.4, Avg: 426026344.7, Max:
426026344.9, Diff: 0.5]
[Ext Root Scanning (ms): Min: 0.7, Avg: 0.9, Max: 1.0, Diff: 0.3,
Sum: 11.4]
[Update RS (ms): Min: 0.0, Avg: 3.1, Max: 5.5, Diff: 5.5, Sum:
40.1]
[Processed Buffers: Min: 0, Avg: 10.5, Max: 28, Diff: 28, Sum:
136]
[Scan RS (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 0.5]
[Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0,
Sum: 0.1]
[Object Copy (ms): Min: 170.5, Avg: 172.9, Max: 176.3, Diff: 5.7,
Sum: 2248.3]
[Termination (ms): Min: 0.0, Avg: 0.1, Max: 0.3, Diff: 0.3, Sum:
1.7]
[GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1,
Sum: 0.4]
[GC Worker Total (ms): Min: 176.9, Avg: 177.1, Max: 177.4, Diff:
0.6, Sum: 2302.3]
[GC Worker End (ms): Min: 426026521.8, Avg: 426026521.8, Max:
426026521.8, Diff: 0.0]

[Code Root Fixup: 0.2 ms]

[Code Root Migration: 0.0 ms]

[Code Root Purge: 0.0 ms]
[Clear CT: 0.2 ms]
[Other: 64.8 ms]
[Evacuation Failure: 60.9 ms]
[Choose CSet: 0.0 ms]
[Ref Proc: 0.3 ms]
[Ref Enq: 0.0 ms]
[Redirty Cards: 0.7 ms]
[Free CSet: 0.3 ms]
[Eden: 624.0M(1280.0M)->0.0B(1280.0M) Survivors: 0.0B->0.0B Heap:
23.2G(25.0G)->23.1G(25.0G)]
[Times: user=0.81 sys=0.02, real=0.25 secs]

I've tried lowering fielddata usage on the cluster, but the heap usage
does not change:

$ curl http://my-host:9200/_cluster/settings?pretty
{
"persistent" : { },
"transient" : {
"indices" : {
"fielddata" : {
"breaker" : {
"limit" : "40%",
"overhead" : "1.2"
}
}
}
}
}

I'm going to look at indices.fielddata.cache.size and
indices.fielddata.cache.expire, but I can't set these dynamically.
Querying the node stats, only around 12GiB seems to be from field data:

$ curl "http://my-host:9200/_nodes/stats?pretty"
...
"indices" : {
...
"fielddata" : {
"memory_size_in_bytes" : 12984041509,
"evictions" : 0,
"fields" : { }
},
},
...
"fielddata_breaker" : {
"maximum_size_in_bytes" : 10737418240,
"maximum_size" : "10gb",
"estimated_size_in_bytes" : 12984041509,
"estimated_size" : "12gb",
"overhead" : 1.2,
"tripped" : 0

Where should I look to see what elasticsearch is doing with all this
heap data?

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/c5d525fc-62fa-4281-ab53-79b2abbaa9be%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/c5d525fc-62fa-4281-ab53-79b2abbaa9be%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/09f80323-f347-4d1d-a792-8ab2eca49132%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/09f80323-f347-4d1d-a792-8ab2eca49132%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEYi1X-Zm-KM9rfNV%3DvoweDrZcBn3Kb_MW9_mUA2_%3DR%2BgD5f1Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Thanks, that sort of feedback is invaluable.

We send JSON representing API calls to logstash, which forwards them to
elasticsearch. Users then use Kibana to do queries like "what are the most
common values passed to this function" or "how has the time taken for this
function varied over time"? Users typically look at time ranges of one to
seven days.

I'm happy to provide more details if that doesn't answer your question.

On Wednesday, 17 December 2014 15:04:17 UTC, Mark Walkom wrote:

Then you're quite possibly at the limits for your heap/nodes.

You can try adding more nodes (recommended), increasing your heap to a max
of 31GB or removing or closing old indexes. If you are using time based
indexes, you can also try disabling bloom filter to get a little bit of
memory back from older indexes, but it won't be much.
It should also be noted that having a shard comes at a cost, and so having
5 shards on two nodes may be a bit of overkill.

What sort of queries are you running?

On 17 December 2014 at 15:03, Wilfred Hughes <yowi...@gmail.com
<javascript:>> wrote:

We're running three nodes (two data and one dataless) and using ES 1.2.4,
for storing logstash data. 500 GiB data total, 49 indexes, 5 shards per
index.

On Wednesday, 17 December 2014 11:39:29 UTC, Mark Walkom wrote:

How many nodes, how much data and in how many indexes? What ES version?

On 17 December 2014 at 11:47, Wilfred Hughes yowi...@gmail.com wrote:

Hi folks

After a few hours/days of uptime, our elasticsearch cluster is spending
all its time in GC. We're forced to restart nodes to bring response times
back to what they should be. We're using G1GC with a 25 GiB heap on Java 8.

In the GC logs, we just see lots of stop-the-world collections:

426011.398: [Full GC (Allocation Failure) 23G->22G(25G), 9.8222680
secs]
[Eden: 0.0B(1280.0M)->0.0B(1280.0M) Survivors: 0.0B->0.0B Heap:
23.2G(25.0G)->22.6G(25.0G)], [Metaspace: 42661K->42661K(1087488K)]
[Times: user=16.97 sys=0.01, real=9.82 secs]
426021.221: Total time for which application threads were stopped:
9.8237600 seconds
426021.221: [GC concurrent-mark-abort]
426022.226: Total time for which application threads were stopped:
0.0015720 seconds
426026.342: [GC pause (G1 Evacuation Pause) (young)
Desired survivor size 83886080 bytes, new threshold 15 (max 15)
(to-space exhausted), 0.2428630 secs]
[Parallel Time: 177.6 ms, GC Workers: 13]
[GC Worker Start (ms): Min: 426026344.4, Avg: 426026344.7, Max:
426026344.9, Diff: 0.5]
[Ext Root Scanning (ms): Min: 0.7, Avg: 0.9, Max: 1.0, Diff: 0.3,
Sum: 11.4]
[Update RS (ms): Min: 0.0, Avg: 3.1, Max: 5.5, Diff: 5.5, Sum:
40.1]
[Processed Buffers: Min: 0, Avg: 10.5, Max: 28, Diff: 28, Sum:
136]
[Scan RS (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 0.5]
[Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff:
0.0, Sum: 0.1]
[Object Copy (ms): Min: 170.5, Avg: 172.9, Max: 176.3, Diff: 5.7,
Sum: 2248.3]
[Termination (ms): Min: 0.0, Avg: 0.1, Max: 0.3, Diff: 0.3, Sum:
1.7]
[GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1,
Sum: 0.4]
[GC Worker Total (ms): Min: 176.9, Avg: 177.1, Max: 177.4, Diff:
0.6, Sum: 2302.3]
[GC Worker End (ms): Min: 426026521.8, Avg: 426026521.8, Max:
426026521.8, Diff: 0.0]

[Code Root Fixup: 0.2 ms]

[Code Root Migration: 0.0 ms]

[Code Root Purge: 0.0 ms]
[Clear CT: 0.2 ms]
[Other: 64.8 ms]
[Evacuation Failure: 60.9 ms]
[Choose CSet: 0.0 ms]
[Ref Proc: 0.3 ms]
[Ref Enq: 0.0 ms]
[Redirty Cards: 0.7 ms]
[Free CSet: 0.3 ms]
[Eden: 624.0M(1280.0M)->0.0B(1280.0M) Survivors: 0.0B->0.0B Heap:
23.2G(25.0G)->23.1G(25.0G)]
[Times: user=0.81 sys=0.02, real=0.25 secs]

I've tried lowering fielddata usage on the cluster, but the heap usage
does not change:

$ curl http://my-host:9200/_cluster/settings?pretty
{
"persistent" : { },
"transient" : {
"indices" : {
"fielddata" : {
"breaker" : {
"limit" : "40%",
"overhead" : "1.2"
}
}
}
}
}

I'm going to look at indices.fielddata.cache.size and
indices.fielddata.cache.expire, but I can't set these dynamically.
Querying the node stats, only around 12GiB seems to be from field data:

$ curl "http://my-host:9200/_nodes/stats?pretty"
...
"indices" : {
...
"fielddata" : {
"memory_size_in_bytes" : 12984041509,
"evictions" : 0,
"fields" : { }
},
},
...
"fielddata_breaker" : {
"maximum_size_in_bytes" : 10737418240,
"maximum_size" : "10gb",
"estimated_size_in_bytes" : 12984041509,
"estimated_size" : "12gb",
"overhead" : 1.2,
"tripped" : 0

Where should I look to see what elasticsearch is doing with all this
heap data?

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/c5d525fc-62fa-4281-ab53-79b2abbaa9be%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/c5d525fc-62fa-4281-ab53-79b2abbaa9be%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/09f80323-f347-4d1d-a792-8ab2eca49132%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/09f80323-f347-4d1d-a792-8ab2eca49132%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c438f62a-82b8-4ae0-8d50-2ef6a22f1f0c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.