Garbage collection log messages, [monitor.jvm ... duration [2.2m]

Wouter_van_Atteveldt · February 7, 2014, 3:33pm

I am using elasticsearch to index and query a fairly large document
collection. Most of the data is in a single property "text" of a doctype
"article". The index is sometimes slow, and my log has many messages about
the garbage collection:

For example, the following is right after starting the elasticsearch
process:

[2014-02-07 16:11:36,681][WARN ][monitor.jvm ] [Warwolves]
[gc][young][30][12] duration [1.1m], collections [11]/[3.1m], total
[1.1m]/[1.1m], memory [485.9mb]->[1.9gb]/[15.9gb], all_pools {[young]
[459.7mb]->[442.3mb]/[599mb]}{[survivor] [26.1mb]->[74.8mb]/[74.8mb]}{[old]
[0b]->[1.4gb]/[15.2gb]}
[2014-02-07 16:11:47,451][WARN ][monitor.jvm ] [Warwolves]
[gc][young][34][13] duration [7.4s], collections [1]/[7.7s], total
[7.4s]/[1.2m], memory [2gb]->[1.6gb]/[15.9gb], all_pools {[young]
[594.1mb]->[8.9mb]/[599mb]}{[survivor] [74.8mb]->[74.8mb]/[74.8mb]}{[old]
[1.4gb]->[1.5gb]/[15.2gb]}
[2014-02-07 16:12:06,311][WARN ][monitor.jvm ] [Warwolves]
[gc][young][41][15] duration [3.3s], collections [1]/[3.4s], total
[3.3s]/[1.3m], memory [2.3gb]->[1.9gb]/[15.9gb], all_pools {[young]
[562.1mb]->[8.5mb]/[599mb]}{[survivor] [74.8mb]->[74.8mb]/[74.8mb]}{[old]
[1.7gb]->[1.8gb]/[15.2gb]}
[2014-02-07 16:16:52,440][WARN ][monitor.jvm ] [Warwolves]
[gc][young][42][33] duration [2.2m], collections [18]/[4.7m], total
[2.2m]/[3.5m], memory [1.9gb]->[4.1gb]/[15.9gb], all_pools {[young]
[8.5mb]->[72.5mb]/[599mb]}{[survivor] [74.8mb]->[74.8mb]/[74.8mb]}{[old]
[1.8gb]->[4gb]/[15.2gb]}

IIUC, the last gc took 2.2 minutes, which indeed feels quite long?

Index size as reported by head:
size: 107G (107G)
docs: 44,832,514 (51,560,620)

I start elastic using ES_HEAP_SIZE=16g and (reundantly?) with arguments
-Xms2G -Xmx16G. The machine is a virtual guest with 48GB memory, and
elastic is running alongside an nginx+uwsgi+django stack.

Except for some logging thresholds, config is unchanged from download. I am
using version 0.90.10

Are the gc messages indicative of a problem? Should I change the
configuration?

Thanks,

-- Wouter

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/bce5f34d-e1af-4059-8e65-b5302ee13ce0%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Binh_Ly · February 7, 2014, 5:43pm

Wouter,

Yes it is possible that you have memory pressure. I'd probably:

Set bootstrap.mlockall: true in the elasticsearch.yml file
Once you're up and running (or when these GC pauses start to happen),
check the node stats to see what you have in memory:

curl "http://localhost:9200/_nodes/stats/jvm?pretty"

That will give you a rough idea if you might need to bump that ES_HEAP_SIZE
up some more (up to 1/2 of your available RAM or 30GB whichever is smaller).

If you're reaching the limits of RAM on a single node, then it might be
time to add more nodes to distribute those shards out horizontally.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2d013066-4b85-422a-a113-728280a18e4a%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Wouter_van_Atteveldt · April 15, 2014, 9:35am

Dear Binh Ly,

Thanks for your reply and sorry for not responding earlier. We've moved
over our elasticsearch to SSD and I had hoped that that might help with the
performance issues, but no luck.

It seems that whenever elastic is freshly started it performs pretty well,
but after a couple days it just becomes really slow and seems to be having
memory issues.

On Friday, February 7, 2014 6:43:08 PM UTC+1, Binh Ly wrote:
Yes it is possible that you have memory pressure. I'd probably:

Set bootstrap.mlockall: true in the elasticsearch.yml file

Once you're up and running (or when these GC pauses start to happen),
check the node stats to see what you have in memory:
curl "http://localhost:9200/_nodes/stats/jvm?pretty"
That will give you a rough idea if you might need to bump that
ES_HEAP_SIZE up some more (up to 1/2 of your available RAM or 30GB
whichever is smaller).

The server has 40G heap size (increased from 30) on a virtual machine with
56G in total, and mlockall is true. The machine is not swapping (it only
runs elastic and nginx/uwsgi). We are still using 0.9.10 on the production
server, I can switch that over to 1.x to see if it helps.

stats.json and the relevant config and log files are posted at

gist.github.com

https://gist.github.com/vanatteveldt/10717100

elastic.init

$ cat /etc/init/elastic.conf
# ElasticSearch Service
 
description     "ElasticSearch"
 
start on (net-device-up
          and local-filesystems
          and runlevel [2345])
 
stop on runlevel [016]

This file has been truncated. show original

elasticsearch.log

$ sudo tail -f /var/log/elastic/elasticsearch.log
[2014-04-15 10:18:15,227][WARN ][monitor.jvm              ] [Masque] [gc][old][79728][40] duration [1.1m], collections [1]/[1.1m], total [1.1m]/[55.6m], memory [39.8gb]->[39.8gb]/[39.9gb], all_pools {[young] [599mb]->[599mb]/[599mb]}{[survivor] [33.7mb]->[44mb]/[74.8mb]}{[old] [39.2gb]->[39.2gb]/[39.2gb]}
[2014-04-15 10:20:14,494][WARN ][monitor.jvm              ] [Masque] [gc][old][79729][41] duration [1.9m], collections [1]/[1.9m], total [1.9m]/[57.6m], memory [39.8gb]->[39.9gb]/[39.9gb], all_pools {[young] [599mb]->[599mb]/[599mb]}{[survivor] [44mb]->[51.5mb]/[74.8mb]}{[old] [39.2gb]->[39.2gb]/[39.2gb]}
[2014-04-15 10:21:23,949][WARN ][monitor.jvm              ] [Masque] [gc][old][79730][42] duration [1.1m], collections [1]/[1.1m], total [1.1m]/[58.7m], memory [39.9gb]->[39.9gb]/[39.9gb], all_pools {[young] [599mb]->[599mb]/[599mb]}{[survivor] [51.5mb]->[58.4mb]/[74.8mb]}{[old] [39.2gb]->[39.2gb]/[39.2gb]}
[2014-04-15 10:23:29,236][WARN ][monitor.jvm              ] [Masque] [gc][old][79731][43] duration [2m], collections [1]/[2m], total [2m]/[1h], memory [39.9gb]->[39.9gb]/[39.9gb], all_pools {[young] [599mb]->[599mb]/[599mb]}{[survivor] [58.4mb]->[61.7mb]/[74.8mb]}{[old] [39.2gb]->[39.2gb]/[39.2gb]}
[2014-04-15 10:24:38,779][WARN ][monitor.jvm              ] [Masque] [gc][old][79732][44] duration [1.1m], collections [1]/[1.1m], total [1.1m]/[1h], memory [39.9gb]->[39.9gb]/[39.9gb], all_pools {[young] [599mb]->[599mb]/[599mb]}{[survivor] [61.7mb]->[73.4mb]/[74.8mb]}{[old] [39.2gb]->[39.2gb]/[39.2gb]}
[2014-04-15 10:28:06,779][WARN ][monitor.jvm              ] [Masque] [gc][old][79733][46] duration [3.4m], collections [2]/[3.4m], total [3.4m]/[1h], memory [39.9gb]->[39.9gb]/[39.9gb], all_pools {[young] [599mb]->[599mb]/[599mb]}{[survivor] [73.4mb]->[68.3mb]/[74.8mb]}{[old] [39.2gb]->[39.2gb]/[39.2gb]}
[2014-04-15 10:30:04,377][WARN ][monitor.jvm              ] [Masque] [gc][old][79734][47] duration [1.9m], collections [1]/[1.9m], total [1.9m]/[1.1h], memory [39.9gb]->[39.9gb]/[39.9gb], all_pools {[young] [599mb]->[599mb]/[599mb]}{[survivor] [68.3mb]->[70.1mb]/[74.8mb]}{[old] [39.2gb]->[39.2gb]/[39.2gb]}
[2014-04-15 10:33:20,989][WARN ][monitor.jvm              ] [Masque] [gc][old][79735][48] duration [1.5m], collections [1]/[1.5m], total [1.5m]/[1.1h], memory [39.9gb]->[39.9gb]/[39.9gb], all_pools {[young] [599mb]->[598.9mb]/[599mb]}{[survivor] [70.1mb]->[71.2mb]/[74.8mb]}{[old] [39.2gb]->[39.2gb]/[39.2gb]}
[2014-04-15 10:36:10,950][WARN ][monitor.jvm              ] [Masque] [gc][old][79736][51] duration [4.5m], collections [3]/[4.5m], total [4.5m]/[1.2h], memory [39.9gb]->[39.9gb]/[39.9gb], all_pools {[young] [598.9mb]->[599mb]/[599mb]}{[survivor] [71.2mb]->[73.4mb]/[74.8mb]}{[old] [39.2gb]->[39.2gb]/[39.2gb]}

This file has been truncated. show original

elasticsearch.yml

wva@amcat-production:~$ grep -v '^#\|^$' /srv/elastic/elasticsearch-0.90.10/config/elasticsearch.yml 
bootstrap.mlockall: true
index.routing.allocation.disable_allocation: false
index.search.slowlog.threshold.query.warn: 10s
index.search.slowlog.threshold.query.info: 5s
index.search.slowlog.threshold.query.debug: 2s
index.search.slowlog.threshold.fetch.warn: 1s
index.search.slowlog.threshold.fetch.info: 800ms
index.search.slowlog.threshold.fetch.debug: 500ms
index.query.bool.max_clause_count: 2048

There are more than three files. show original

If you're reaching the limits of RAM on a single node, then it might
be time to add more nodes to distribute those shards out horizontally.

Yeah I guess that would be the ultimate remedy, but I don't really have
budget at the moment to add servers.

Thanks for any help,

Wouter

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4fdf044b-4668-4b0e-8f53-ae86dbdd846b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

jprante · April 15, 2014, 12:00pm

This is not Elasticsearch related. If you use a 40g heap of such extreme
size, you must expect that garbage collection must run for minutes, on
every JVM I know.

Jörg

On Tue, Apr 15, 2014 at 11:35 AM, Wouter van Atteveldt <
vanatteveldt@gmail.com> wrote:

Dear Binh Ly,

Thanks for your reply and sorry for not responding earlier. We've moved
over our elasticsearch to SSD and I had hoped that that might help with the
performance issues, but no luck.

It seems that whenever elastic is freshly started it performs pretty well,
but after a couple days it just becomes really slow and seems to be having
memory issues.

On Friday, February 7, 2014 6:43:08 PM UTC+1, Binh Ly wrote:
Yes it is possible that you have memory pressure. I'd probably:

Set bootstrap.mlockall: true in the elasticsearch.yml file

Once you're up and running (or when these GC pauses start to happen),
check the node stats to see what you have in memory:
curl "http://localhost:9200/_nodes/stats/jvm?pretty"
That will give you a rough idea if you might need to bump that
ES_HEAP_SIZE up some more (up to 1/2 of your available RAM or 30GB
whichever is smaller).

The server has 40G heap size (increased from 30) on a virtual machine with
56G in total, and mlockall is true. The machine is not swapping (it only
runs elastic and nginx/uwsgi). We are still using 0.9.10 on the production
server, I can switch that over to 1.x to see if it helps.

stats.json and the relevant config and log files are posted at
elastic.init · GitHub

If you're reaching the limits of RAM on a single node, then it might
be time to add more nodes to distribute those shards out horizontally.

Yeah I guess that would be the ultimate remedy, but I don't really have
budget at the moment to add servers.

Thanks for any help,

Wouter

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/4fdf044b-4668-4b0e-8f53-ae86dbdd846b%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/4fdf044b-4668-4b0e-8f53-ae86dbdd846b%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFibBNzZoXfquAQobg-E2xVONsH_Tan%3DCeCHw-zspVUiw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Wouter_van_Atteveld1 · April 15, 2014, 1:42pm

On Tue, Apr 15, 2014 at 2:00 PM, joergprante@gmail.com <
joergprante@gmail.com> wrote:

This is not Elasticsearch related. If you use a 40g heap of such extreme
size, you must expect that garbage collection must run for minutes, on
every JVM I know.

Right, but it is actually advised to give elastic a lot of heap, right? The
whole index is around 140G, so I would have thought that all frequently
used parts should get loaded in memory, but it still starts running slow
after a while.

Any ideas?

-- Wouter

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CACXi6Xe_Q3xy8xZNo5QDqPpsJot8AknCYXfo5N9Vw2AyOfbVZA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

nik9000 · April 15, 2014, 1:55pm

On Tue, Apr 15, 2014 at 9:42 AM, Wouter van Atteveldt <
wouter@vanatteveldt.com> wrote:

On Tue, Apr 15, 2014 at 2:00 PM, joergprante@gmail.com <
joergprante@gmail.com> wrote:

This is not Elasticsearch related. If you use a 40g heap of such extreme
size, you must expect that garbage collection must run for minutes, on
every JVM I know.

Right, but it is actually advised to give elastic a lot of heap, right?
The whole index is around 140G, so I would have thought that all frequently
used parts should get loaded in memory, but it still starts running slow
after a while.

Any ideas?

Go with 30GB. 30GB is magic because much over that and the JVM can't do
pointer compression so there is a hole in how effective heap is. You can
learn more by clicking links in this:

Beyond that, you may want to look at what is actually happening when
collections are done. This article is about Cassandra but it seems pretty
on the ball:
http://tech.shift.com/post/74311817513/cassandra-tuning-the-jvm-for-read-heavy-workloads

Beyond that, scale out.

Nik

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPmjWd2ZUBdwM7m09A0BBZ-ugaJDLxYLqXcH6RoMVJYRJFQhLg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

jprante · April 15, 2014, 2:40pm

The advisory of "a lot of heap" means, give as much heap as the JVM is able
to process efficiently. There is an upper limit due to JVM engineering
state of today. You will not find JVMs that can efficiently manage heaps

32G (except rare expensive commercial JVM products). By efficient I mean
GC stalls under a second. There is heavy engineering going on, known as the
Shenandoah project, to tackle heaps over 100G by millisecond GC:
JEP 189: Shenandoah: A Low-Pause-Time Garbage Collector (Experimental)

The mere index size is not related to heap size choice. You need large heap
if you want filter caching and aggregations/facets cached.

Example: I have 350G on index files. On my 3x64G RAM nodes I have assigned
3x16G heap and I do not cache filters, due to the nature of my queries. The
other ~48G I left to OS, for file system buffers (direct I/O is the key to
fast systems). If I assigned 32G to heap, GC would be inacceptable high,
and system would go sluggish after some days, as you had described. It is
not a matter of heap size, but of balancing things carefully out between
JVM management abilities and operating system I/O power. The challenge is
that many ES workload patterns require different balancings.

Jörg

On Tue, Apr 15, 2014 at 3:42 PM, Wouter van Atteveldt <
wouter@vanatteveldt.com> wrote:

On Tue, Apr 15, 2014 at 2:00 PM, joergprante@gmail.com <
joergprante@gmail.com> wrote:

This is not Elasticsearch related. If you use a 40g heap of such extreme
size, you must expect that garbage collection must run for minutes, on
every JVM I know.

Right, but it is actually advised to give elastic a lot of heap, right?
The whole index is around 140G, so I would have thought that all frequently
used parts should get loaded in memory, but it still starts running slow
after a while.

Any ideas?

-- Wouter

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CACXi6Xe_Q3xy8xZNo5QDqPpsJot8AknCYXfo5N9Vw2AyOfbVZA%40mail.gmail.com https://groups.google.com/d/msgid/elasticsearch/CACXi6Xe_Q3xy8xZNo5QDqPpsJot8AknCYXfo5N9Vw2AyOfbVZA%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoERmpqOqp%2Bzq3jMaCZ7e2OrXuboVP9W6BFU%3DDS51RL%3DPw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Wouter_van_Atteveld1 · April 15, 2014, 8:11pm

Thanks for the explanation, that really helps.

Does that mean that on a virtual host with 64GB memory it might make sense
to make two virtual servers each running a node? I had expected that
multiple nodes on a single host would not help, but I guess if the VM is
the limitation it might?

I have a read-heavy workload, with good use of facets/aggregations and also
some really complex queries (>1000 terms), but most of them limited to
subsets of <10k or 100k documents (out of 50M). Any recommendations would
be much appreciated!

-- Wouter

On Tue, Apr 15, 2014 at 4:40 PM, joergprante@gmail.com <
joergprante@gmail.com> wrote:

The advisory of "a lot of heap" means, give as much heap as the JVM is
able to process efficiently. There is an upper limit due to JVM engineering
state of today. You will not find JVMs that can efficiently manage heaps

32G (except rare expensive commercial JVM products). By efficient I mean
GC stalls under a second. There is heavy engineering going on, known as the
Shenandoah project, to tackle heaps over 100G by millisecond GC:
JEP 189: Shenandoah: A Low-Pause-Time Garbage Collector (Experimental)

The mere index size is not related to heap size choice. You need large
heap if you want filter caching and aggregations/facets cached.

Example: I have 350G on index files. On my 3x64G RAM nodes I have assigned
3x16G heap and I do not cache filters, due to the nature of my queries. The
other ~48G I left to OS, for file system buffers (direct I/O is the key to
fast systems). If I assigned 32G to heap, GC would be inacceptable high,
and system would go sluggish after some days, as you had described. It is
not a matter of heap size, but of balancing things carefully out between
JVM management abilities and operating system I/O power. The challenge is
that many ES workload patterns require different balancings.

Jörg

On Tue, Apr 15, 2014 at 3:42 PM, Wouter van Atteveldt <
wouter@vanatteveldt.com> wrote:

On Tue, Apr 15, 2014 at 2:00 PM, joergprante@gmail.com <
joergprante@gmail.com> wrote:

This is not Elasticsearch related. If you use a 40g heap of such extreme
size, you must expect that garbage collection must run for minutes, on
every JVM I know.

Right, but it is actually advised to give elastic a lot of heap, right?
The whole index is around 140G, so I would have thought that all frequently
used parts should get loaded in memory, but it still starts running slow
after a while.

Any ideas?

-- Wouter

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CACXi6Xe_Q3xy8xZNo5QDqPpsJot8AknCYXfo5N9Vw2AyOfbVZA%40mail.gmail.com https://groups.google.com/d/msgid/elasticsearch/CACXi6Xe_Q3xy8xZNo5QDqPpsJot8AknCYXfo5N9Vw2AyOfbVZA%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/yfQv5sDuF40/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoERmpqOqp%2Bzq3jMaCZ7e2OrXuboVP9W6BFU%3DDS51RL%3DPw%40mail.gmail.com https://groups.google.com/d/msgid/elasticsearch/CAKdsXoERmpqOqp%2Bzq3jMaCZ7e2OrXuboVP9W6BFU%3DDS51RL%3DPw%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CACXi6XcTY4P4AG7UarPfrBjx24%3DhnbcmouEGGfHz1WLHXT5b3w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Topic		Replies	Views
Elasticsearch taking a long time for garbage collection Elasticsearch	6	2439	July 6, 2017
Elasticsearch garbage collection - is this a worry? Elasticsearch	2	5510	August 8, 2017
Elasticsearch heavy garbage collection Elasticsearch	2	557	July 6, 2017
Elasticsearch endless garbage collection Elasticsearch	2	389	July 6, 2017
Garbage collector question Elasticsearch	13	672	April 19, 2018

Garbage collection log messages, [monitor.jvm ... duration [2.2m]

Related topics