GC Allocation ERROR


(BIJAY) #1

Hi,

Can anyone help me in getting this error resolved?

[2018-10-08T04:09:29,360][INFO ][o.e.n.Node               ] [lmm-elasticsearch-ingest-0] JVM arguments [-Xms10g, -Xmx10g, -XX:+UseConcMarkSweepGC, -XX:CMSInitiatingOccupancyFraction=75, -XX:+UseCMSInitiatingOccupancyOnly, -XX:+AlwaysPreTouch, -Xss10m, -Djava.awt.headless=true, -Dfile.encoding=UTF-8, -Djna.nosys=true, -XX:-OmitStackTraceInFastThrow, -Dio.netty.noUnsafe=true, -Dio.netty.noKeySetOptimization=true, -Dio.netty.recycler.maxCapacityPerThread=0, -Dlog4j.shutdownHookEnabled=false, -Dlog4j2.disable.jmx=true, -Djava.io.tmpdir=/tmp/elasticsearch.57Q1Wksd, -XX:+HeapDumpOnOutOfMemoryError, -XX:HeapDumpPath=/heap/dump/path, -XX:+PrintGCDetails, -XX:+PrintGCDateStamps, -XX:+PrintTenuringDistribution, -XX:+PrintGCApplicationStoppedTime, -XX:+UseGCLogFileRotation, -XX:NumberOfGCLogFiles=32, -XX:GCLogFileSize=64m, -Des.path.home=/usr/share/elasticsearch, -Des.path.conf=/usr/share/elasticsearch/config]
2018-10-08T04:09:30.912+0000: Total time for which application threads were stopped: 0.0003064 seconds, Stopping threads took: 0.0001065 seconds
2018-10-08T04:09:31.030+0000: [GC (CMS Initial Mark) To enable GC log rotation, use -Xloggc:<filename> -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=<num_of_files>
where num_of_file > 0
GC log rotation is turned off
[1 CMS-initial-mark: 0K(9378240K)] 549378K(10375040K), 0.0340352 secs] [Times: user=0.17 sys=0.00, real=0.04 secs] 

Desired survivor size 56688640 bytes, new threshold 1 (max 6)
- age   1:   68553896 bytes,   68553896 total
: 886080K->67078K(996800K), 0.1413758 secs] 886080K->67078K(10375040K), 0.1414290 secs] [Times: user=0.25 sys=0.00, real=0.14 secs] 
2018-10-08T04:09:32.059+0000: Total time for which application threads were stopped: 0.1416038 seconds, Stopping threads took: 0.0000628 seconds
2018-10-08T04:09:32.059+0000: [GC (CMS Final Remark) [YG occupancy: 83911 K (996800 K)]2018-10-08T04:09:32.059+0000: [Rescan (parallel) , 0.0825867 secs]2018-10-08T04:09:32.142+0000: [weak refs processing, 0.0000361 secs]2018-10-08T04:09:32.142+0000: [class unloading, 0.0038167 secs]2018-10-08T04:09:32.146+0000: [scrub symbol table, 0.0030758 secs]2018-10-08T04:09:32.149+0000: [scrub string table, 0.0003422 secs][1 CMS-remark: 0K(9378240K)] 83911K(10375040K), 0.0906259 secs] [Times: user=0.33 sys=0.00, real=0.09 secs] 



Desired survivor size 56688640 bytes, new threshold 1 (max 6)
- age   1:  105808944 bytes,  105808944 total
: 953158K->110720K(996800K), 0.2960513 secs] 953158K->152236K(10375040K), 0.2961296 secs] [Times: user=0.55 sys=0.02, real=0.29 secs] 
2018-10-08T04:09:43.101+0000: Total time for which application threads were stopped: 0.2964850 seconds, Stopping threads took: 0.0001872 seconds
2018-10-08T04:09:43.236+0000: Total time for which application threads were stopped: 0.0055755 seconds, Stopping threads took: 0.0052903 seconds
[2018-10-08T04:09:43,336][INFO ][o.e.c.s.ClusterApplierService] [lmm-elasticsearch-ingest-0] detected_master {lmm-elasticsearch-master-0}{Pd1P4gx0TVGHq8gILQ5OJw}{g41VOXutS4Sb9CEiUPK3Uw}{10.1.26.67}{10.1.26.67:9300}{ml.machine_memory=67371880448, ml.max_open_jobs=20, ml.enabled=true}, added {{lmm-elasticsearch-master-1}{vpgEsjtLQvizMmtB-RqDgg}{lCJq-NarQISONs7mtvkBiQ}{10.1.44.102}{10.1.44.102:9300}{ml.machine_memory=67371880448, ml.max_open_jobs=20, ml.enabled=true},{lmm-elasticsearch-data-1}{6VSfUQ8GST2yF8zZqIjwwA}{89jUESjNRnuVbP7jJprrpA}{10.1.34.87}{10.1.34.87:9300}{ml.machine_memory=67371880448, ml.max_open_jobs=20, ml.enabled=true},{lmm-elasticsearch-data-4}{TdDXdLPOTLG-Z2KVAXTKsg}{GG02SK2ZQQ-tOHAGbQ1GBw}{10.1.12.62}{10.1.12.62:9300}{ml.machine_memory=67371880448, ml.max_open_jobs=20, ml.enabled=true},{lmm-elasticsearch-master-2}{MQRg5HvgQ6-7mrxIAZ8gPA}{JEVTUQwdTzW6vAzwgbF-Xg}{10.1.16.43}{10.1.16.43:9300}{ml.machine_memory=67371880448, ml.max_open_jobs=20, ml.enabled=true},{lmm-elasticsearch-data-3}{WhaAwVG5QwC8wc9PHubAfA}{UEBLhmFnQUCIT_m-Kp11pw}{10.1.28.58}{10.1.28.58:9300}{ml.machine_memory=67371880448, ml.max_open_jobs=20, ml.enabled=true},{lmm-elasticsearch-ingest-1}{eP8lT5xhTg2czRw_cLpjqg}{BIiLSgRESEi2KDnYArTAyg}{10.1.42.48}{10.1.42.48:9300}{ml.machine_memory=67371880448, ml.max_open_jobs=20, ml.enabled=true},{lmm-elasticsearch-master-0}{Pd1P4gx0TVGHq8gILQ5OJw}{g41VOXutS4Sb9CEiUPK3Uw}{10.1.26.67}{10.1.26.67:9300}{ml.machine_memory=67371880448, ml.max_open_jobs=20, ml.enabled=true},{lmm-elasticsearch-data-0}{t4gJgBRJTBqZs3jgs4r3pQ}{OGWNOIiWTNuFJTfDLGQ3mw}{10.1.44.104}{10.1.44.104:9300}{ml.machine_memory=67371880448, ml.max_open_jobs=20, ml.enabled=true},{lmm-elasticsearch-data-2}{iic3VR5DQSu1mQrc9BFq7g}{YAM0kfmGQWWF0b1b6eUMtg}{10.1.18.31}{10.1.18.31:9300}{ml.machine_memory=67371880448, ml.max_open_jobs=20, ml.enabled=true},}, reason: apply cluster state (from master [master {lmm-elasticsearch-master-0}{Pd1P4gx0TVGHq8gILQ5OJw}{g41VOXutS4Sb9CEiUPK3Uw}{10.1.26.67}{10.1.26.67:9300}{ml.machine_memory=67371880448, ml.max_open_jobs=20, ml.enabled=true} committed version [500]])
2018-10-08T04:09:43.356+0000: Total time for which application threads were stopped: 0.0095804 seconds, Stopping threads took: 0.0092086 seconds
[2018-10-08T04:09:43,530][INFO ][o.e.m.j.JvmGcMonitorService] [lmm-elasticsearch-ingest-0] [gc][4] overhead, spent [296ms] collecting in the last [1s]
[2018-10-08T04:09:43,650][INFO ][o.e.l.LicenseService     ] [lmm-elasticsearch-ingest-0] license [2980849b-1faa-425e-883b-816f1a329173] mode [trial] - valid
[2018-10-08T04:09:43,660][INFO ][o.e.h.n.Netty4HttpServerTransport] [lmm-elasticsearch-ingest-0] publish_address {10.1.28.56:9200}, bound_addresses {[::]:9200}
[2018-10-08T04:09:43,660][INFO ][o.e.n.Node               ] [lmm-elasticsearch-ingest-0] started
 CMS: abort preclean due to time 2018-10-08T04:09:43.722+0000: [CMS-concurrent-abortable-preclean: 4.249/5.367 secs] [Times: user=7.09 sys=0.47, real=5.36 secs] 
2018-10-08T04:09:43.723+0000: [GC (CMS Final Remark) [YG occupancy: 725644 K (996800 K)]2018-10-08T04:09:43.723+0000: [Rescan (parallel) , 0.0096169 secs]2018-10-08T04:09:43.733+0000: [weak refs processing, 0.0001615 secs]2018-10-08T04:09:43.733+0000: [class unloading, 0.0072790 secs]2018-10-08T04:09:43.740+0000: [scrub symbol table, 0.0073525 secs]2018-10-08T04:09:43.747+0000: [scrub string table, 0.0007101 secs][1 CMS-remark: 41516K(9378240K)] 767160K(10375040K), 0.0265786 secs] [Times: user=0.10 sys=0.00, real=0.03 secs] 



2018-07-26T08:31:25.851+0000: [GC (Allocation Failure) 2018-07-26T08:31:25.851+0000: [ParNew
Desired survivor size 17891328 bytes, new threshold 1 (max 6)
- age   1:   28026592 bytes,   28026592 total
: 314144K->28172K(314560K), 0.3048318 secs] 326577K->41099K(1013632K), 0.3048918 secs] [Times: user=0.31 sys=0.01, real=0.31 secs]

Thanks


(Daniel Mitterdorfer) #2

Hi,

first of all it seems that the Elasticsearch log and the JVM's garbage collection log are intermingled in the output. I think this is because you did not set the parameter -Xloggc. You should set it and let it point to a file, e.g. -Xloggc:/var/log/elasticsearch/gc.log.

Secondly, an "allocation failure" is not an error but a normal case in the garbage collector. This just means that the garbage collector is now about to run in order to free up memory. To read more about the CMS garbage collector, please check Oracle's GC tuning docs (that link is applicable to Java 11 but the information in there is applicable for Java 8 as well).

Daniel


(BIJAY) #3

Thanks a lot Daniel for the input.
I tried setting the parameter -Xloggc and also pointed to -Xloggc:/var/log/elasticsearch/gc.log .

It didn't help. I understand that "allocation failure" is not an error but a normal case in the garbage collector.

But I couldn't figure out from the given link or from any other resources how to fix the issue. It will be really helpful if you can suggest something.

Thanks
Bijay


(Daniel Mitterdorfer) #4

Hi,

Can you please tell specifically to what respect that did not help? The GC log output is written to a separate file when you specify -Xloggc

I am sorry but I do not understand which issue you are attempting to fix? Do you mean that you want to avoid the output "allocation failure"?

Daniel


(BIJAY) #5

With respect to your first question, it didn't help fixing the GC allocation error.
And I want to avoid the crashing of the elastic-search. And there is no other relevant error apart from the GC allocation error.

Bijay


(BIJAY) #6

I could just see the gc logs are redirected to the path and fyi, this is happening for the elasticsearch ingest nodes.

-Bijay


(Daniel Mitterdorfer) #7

Hi,

let me state this once more: the GC log entry you are seeing is normal and you cannot avoid this. It only means that the garbage collector is about to run and this is a healthy thing.

How and when does Elasticsearch "crash"? According to the logs that you have pasted initially, the node started fine:

[2018-10-08T04:09:43,660][INFO ][o.e.n.Node               ] [lmm-elasticsearch-ingest-0] started

If the node terminates later on for any reason, there should be an entry in the log, a core dump, heap dump or the like to provide more evidence.

Daniel


(BIJAY) #8

Please find the log below.

Starting elasticsearch ...
2018-10-07T04:40:44.286+0000: Total time for which application threads were stopped: 0.0001605 seconds, Stopping threads took: 0.0000563 seconds
[2018-10-07T04:40:46,837][INFO ][o.e.n.Node ] [lmm-elasticsearch-ingest-0] initializing ...
2018-10-07T04:40:46.883+0000: Total time for which application threads were stopped: 0.0004247 seconds, Stopping threads took: 0.0000745 seconds
[2018-10-07T04:40:46,895][INFO ][o.e.e.NodeEnvironment ] [lmm-elasticsearch-ingest-0] using [1] data paths, mounts [[/var/data/elasticsearch (/dev/xvda1)]], net usable_space [12.4gb], net total_space [79.9gb], types [xfs]
[2018-10-07T04:40:46,896][INFO ][o.e.e.NodeEnvironment ] [lmm-elasticsearch-ingest-0] heap size [11.3gb], compressed ordinary object pointers [true]
[2018-10-07T04:40:46,897][INFO ][o.e.n.Node ] [lmm-elasticsearch-ingest-0] node name [lmm-elasticsearch-ingest-0], node ID [f73VJ63NQruVRhMMdAPTmQ]
[2018-10-07T04:40:46,897][INFO ][o.e.n.Node ] [lmm-elasticsearch-ingest-0] version[6.2.2], pid[38], build[10b1edd/2018-02-16T19:01:30.685723Z], OS[Linux/3.10.0-693.17.1.el7.x86_64/amd64], JVM[Oracle Corporation/OpenJDK 64-Bit Server VM/1.8.0_161/25.161-b14]
[2018-10-07T04:40:46,897][INFO ][o.e.n.Node ] [lmm-elasticsearch-ingest-0] JVM arguments [-Xms12g, -Xmx12g, -XX:+UseConcMarkSweepGC, -XX:CMSInitiatingOccupancyFraction=75, -XX:+UseCMSInitiatingOccupancyOnly, -XX:MaxNewSize=7g, -XX:+AlwaysPreTouch, -Xss1m, -Djava.awt.headless=true, -Dfile.encoding=UTF-8, -Djna.nosys=true, -XX:-OmitStackTraceInFastThrow, -Dio.netty.noUnsafe=true, -Dio.netty.noKeySetOptimization=true, -Dio.netty.recycler.maxCapacityPerThread=0, -Dlog4j.shutdownHookEnabled=false, -Dlog4j2.disable.jmx=true, -Djava.io.tmpdir=/tmp/elasticsearch.UxIymGYo, -XX:+HeapDumpOnOutOfMemoryError, -XX:+PrintGCDetails, -XX:+PrintGCDateStamps, -XX:+PrintTenuringDistribution, -XX:+PrintGCApplicationStoppedTime, -XX:+UseGCLogFileRotation, -XX:NumberOfGCLogFiles=32, -XX:GCLogFileSize=64m, -Des.path.home=/usr/share/elasticsearch, -Des.path.conf=/usr/share/elasticsearch/config]
2018-10-07T04:40:48.268+0000: Total time for which application threads were stopped: 0.0004576 seconds, Stopping threads took: 0.0001058 seconds
2018-10-07T04:40:50.198+0000: Total time for which application threads were stopped: 0.0003106 seconds, Stopping threads took: 0.0000783 seconds
2018-10-07T04:40:50.198+0000: [GC (CMS Initial Mark) To enable GC log rotation, use -Xloggc: -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=<num_of_files>
where num_of_file > 0
GC log rotation is turned off
[1 CMS-initial-mark: 0K(5242880K)] 1644223K(11848960K), 0.1590150 secs] [Times: user=0.32 sys=0.00, real=0.16 secs]
2018-10-07T04:40:50.357+0000: Total time for which application threads were stopped: 0.1592459 seconds, Stopping threads took: 0.0000894 seconds
2018-10-07T04:40:50.357+0000: [CMS-concurrent-mark-start]
2018-10-07T04:40:50.359+0000: [CMS-concurrent-mark: 0.001/0.001 secs] [Times: user=0.01 sys=0.00, real=0.00 secs]
2018-10-07T04:40:50.359+0000: [CMS-concurrent-preclean-start]
2018-10-07T04:40:50.365+0000: [CMS-concurrent-preclean: 0.006/0.006 secs] [Times: user=0.03 sys=0.00, real=0.01 secs]
2018-10-07T04:40:50.365+0000: [CMS-concurrent-abortable-preclean-start]
2018-10-07T04:40:55.358+0000: Total time for which application threads were stopped: 0.0004360 seconds, Stopping threads took: 0.0000803 seconds
CMS: abort preclean due to time 2018-10-07T04:40:55.431+0000: [CMS-concurrent-abortable-preclean: 1.670/5.066 secs] [Times: user=10.01 sys=0.11, real=5.06 secs]
2018-10-07T04:40:55.432+0000: Total time for which application threads were stopped: 0.0006131 seconds, Stopping threads took: 0.0000574 seconds
2018-10-07T04:40:55.433+0000: [GC (CMS Final Remark) [YG occupancy: 2113994 K (6606080 K)]2018-10-07T04:40:55.433+0000: [Rescan (parallel) , 1.0069157 secs]2018-10-07T04:40:56.439+0000: [weak refs processing, 0.0001541 secs]2018-10-07T04:40:56.440+0000: [class unloading, 0.0054691 secs]2018-10-07T04:40:56.445+0000: [scrub symbol table, 0.0058738 secs]2018-10-07T04:40:56.451+0000: [scrub string table, 0.0006907 secs][1 CMS-remark: 0K(5242880K)] 2113994K(11848960K), 1.0209028 secs] [Times: user=2.04 sys=0.01, real=1.03 secs]
2018-10-07T04:40:56.453+0000: Total time for which application threads were stopped: 1.0212395 seconds, Stopping threads took: 0.0002442 seconds
2018-10-07T04:40:56.457+0000: [CMS-concurrent-sweep-start]
2018-10-07T04:40:56.457+0000: [CMS-concurrent-sweep: 0.000/0.000 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
2018-10-07T04:40:56.457+0000: [CMS-concurrent-reset-start]


(BIJAY) #9

Continues..

2018-10-07T04:40:56.470+0000: [CMS-concurrent-reset: 0.013/0.013 secs] [Times: user=0.04 sys=0.00, real=0.01 secs]
2018-10-07T04:40:57.454+0000: Total time for which application threads were stopped: 0.0003499 seconds, Stopping threads took: 0.0001688 seconds
[2018-10-07T04:40:57,699][INFO ][o.e.x.m.j.p.l.CppLogMessageHandler] [controller/158] [Main.cc@128] controller (64 bit): Version 6.2.2 (Build 61c43548e5e0f2) Copyright (c) 2018 Elasticsearch BV
[2018-10-07T04:40:57,919][INFO ][o.e.d.DiscoveryModule ] [lmm-elasticsearch-ingest-0] using discovery type [zen]
2018-10-07T04:40:58.455+0000: Total time for which application threads were stopped: 0.0003864 seconds, Stopping threads took: 0.0001400 seconds
2018-10-07T04:40:58.560+0000: Total time for which application threads were stopped: 0.0007151 seconds, Stopping threads took: 0.0000892 seconds
[2018-10-07T04:40:59,443][INFO ][o.e.n.Node ] [lmm-elasticsearch-ingest-0] initialized
[2018-10-07T04:40:59,444][INFO ][o.e.n.Node ] [lmm-elasticsearch-ingest-0] starting ...
2018-10-07T04:40:59.466+0000: Total time for which application threads were stopped: 0.0007060 seconds, Stopping threads took: 0.0000719 seconds
2018-10-07T04:40:59.585+0000: Total time for which application threads were stopped: 0.0006127 seconds, Stopping threads took: 0.0000669 seconds
2018-10-07T04:40:59.606+0000: Total time for which application threads were stopped: 0.0003379 seconds, Stopping threads took: 0.0000651 seconds
[2018-10-07T04:40:59,607][INFO ][o.e.t.TransportService ] [lmm-elasticsearch-ingest-0] publish_address {10.1.28.37:9300}, bound_addresses {10.1.28.37:9300}
[2018-10-07T04:40:59,617][INFO ][o.e.b.BootstrapChecks ] [lmm-elasticsearch-ingest-0] bound or publishing to a non-loopback address, enforcing bootstrap checks
2018-10-07T04:41:02.710+0000: [GC (Allocation Failure) 2018-10-07T04:41:02.711+0000: [ParNew
Desired survivor size 375783424 bytes, new threshold 6 (max 6)

  • age 1: 205688280 bytes, 205688280 total
    : 5872128K->201137K(6606080K), 0.2407542 secs] 5872128K->201137K(11848960K), 0.2408296 secs] [Times: user=0.41 sys=0.00, real=0.24 secs]
    2018-10-07T04:41:02.951+0000: Total time for which application threads were stopped: 0.2411593 seconds, Stopping threads took: 0.0001515 seconds
    2018-10-07T04:41:02.952+0000: [GC (CMS Initial Mark) [1 CMS-initial-mark: 0K(5242880K)] 259858K(11848960K), 0.0331727 secs] [Times: user=0.09 sys=0.00, real=0.04 secs]
    2018-10-07T04:41:02.985+0000: Total time for which application threads were stopped: 0.0334700 seconds, Stopping threads took: 0.0001271 seconds
    2018-10-07T04:41:02.985+0000: [CMS-concurrent-mark-start]
    2018-10-07T04:41:02.986+0000: [CMS-concurrent-mark: 0.001/0.001 secs] [Times: user=0.01 sys=0.00, real=0.00 secs]
    2018-10-07T04:41:02.986+0000: [CMS-concurrent-preclean-start]
    2018-10-07T04:41:02.999+0000: [CMS-concurrent-preclean: 0.012/0.012 secs] [Times: user=0.08 sys=0.01, real=0.01 secs]
    2018-10-07T04:41:02.999+0000: [CMS-concurrent-abortable-preclean-start]
    2018-10-07T04:41:03.242+0000: Total time for which application threads were stopped: 0.0108150 seconds, Stopping threads took: 0.0105292 seconds
    [2018-10-07T04:41:03,342][INFO ][o.e.c.s.ClusterApplierService] [lmm-elasticsearch-ingest-0] detected_master {lmm-elasticsearch-master-1}{HLiDEs4mSyyLY5NVdrrfxg}{yEmQjtlbQfyLEcVZpGXHrw}{10.1.20.35}{10.1.20.35:9300}{ml.machine_memory=67371880448, ml.max_open_jobs=20, ml.enabled=true}, added {{lmm-elasticsearch-data-1}{wD7uOaMuTJelr_NNrIyWjg}{WI0EvqZvQyeV76aHZBHy5w}{10.1.30.128}{10.1.30.128:9300}{ml.machine_memory=67371880448, ml.max_open_jobs=20, ml.enabled=true},{lmm-elasticsearch-ingest-4}{JcvKthKxS4mQeN_wpwFcFg}{kn96dlnpTeGN3xxLeWEHqQ}{10.1.42.44}{10.1.42.44:9300}{ml.machine_memory=67371880448, ml.max_open_jobs=20, ml.enabled=true},{lmm-elasticsearch-data-0}{cd5vIogbRaafhy4bGAhW7w}{sbVI-u7ZTLqJu0-lwGUYsw}{10.1.34.86}{10.1.34.86:9300}{ml.machine_memory=67371880448, ml.max_open_jobs=20, ml.enabled=true},{lmm-elasticsearch-data-3}{2aCLR_45RuSy5UKmnRHszw}{3bILECioQUuMoRJsXma_ew}{10.1.38.38}{10.1.38.38:9300}{ml.machine_memory=67371880448, ml.max_open_jobs=20, ml.enabled=true},{lmm-elasticsearch-master-2}{nk1TcJ9qSDyjXaXtvAQ1hQ}{DfRMY0gtRAe7TOcnGCiP1Q}{10.1.40.41}{10.1.40.41:9300}{ml.machine_memory=67371880448, ml.max_open_jobs=20, ml.enabled=true},{lmm-elasticsearch-master-1}{HLiDEs4mSyyLY5NVdrrfxg}{yEmQjtlbQfyLEcVZpGXHrw}{10.1.20.35}{10.1.20.35:9300}{ml.machine_memory=67371880448, ml.max_open_jobs=20, ml.enabled=true},{lmm-elasticsearch-data-2}{dQ4PDT6_TpafhrMTfj5iNw}{_DPXZD4bTD6fm0dHbrkE8A}{10.1.40.42}{10.1.40.42:9300}{ml.machine_memory=67371880448, ml.max_open_jobs=20, ml.enabled=true},{lmm-elasticsearch-ingest-3}{XBAlQOrFQg6fu0XlGuhR9w}{EVViETOUR0yl9Yx5JDlM1g}{10.1.26.49}{10.1.26.49:9300}{ml.machine_memory=67371880448, ml.max_open_jobs=20, ml.enabled=true},{lmm-elasticsearch-ingest-2}{Z43OHma0TNGwW5aqkblf6A}{fmUDtTpXTP2Jf-6-2fPypQ}{10.1.44.96}{10.1.44.96:9300}{ml.machine_memory=67371880448, ml.max_open_jobs=20, ml.enabled=true},{lmm-elasticsearch-data-4}{Fe9ulXOHQVutShmk9-mbeA}{A8Pv628lSvuoToCJoIebFQ}{10.1.12.53}{10.1.12.53:9300}{ml.machine_memory=67371880448, ml.max_open_jobs=20, ml.enabled=true},{lmm-elasticsearch-ingest-1}{06606u0pQGub7kHC20O_UQ}{tTd3RxsnR82YtZm44Iz0dg}{10.1.16.37}{10.1.16.37:9300}{ml.machine_memory=67371880448, ml.max_open_jobs=20, ml.enabled=true},{lmm-elasticsearch-master-0}{KklqatisSHKAJm5R4z88LQ}{ad5DB3fBTT6b9ZGYQOZW6g}{10.1.30.127}{10.1.30.127:9300}{ml.machine_memory=67371880448, ml.max_open_jobs=20, ml.enabled=true},}, reason: apply cluster state (from master [master {lmm-elasticsearch-master-1}{HLiDEs4mSyyLY5NVdrrfxg}{yEmQjtlbQfyLEcVZpGXHrw}{10.1.20.35}{10.1.20.35:9300}{ml.machine_memory=67371880448, ml.max_open_jobs=20, ml.enabled=true} committed version [963]])
    [2018-10-07T04:41:03,449][INFO ][o.e.m.j.JvmGcMonitorService] [lmm-elasticsearch-ingest-0] [gc][4] overhead, spent [273ms] collecting in the last [1s]
    2018-10-07T04:41:03.643+0000: Total time for which application threads were stopped: 0.0004753 seconds, Stopping threads took: 0.0000873 seconds
    [2018-10-07T04:41:03,658][INFO ][o.e.l.LicenseService ] [lmm-elasticsearch-ingest-0] license [1d17d0dc-ff9f-4013-ad5a-8b9b510a7758] mode [trial] - valid
    [2018-10-07T04:41:03,668][INFO ][o.e.h.n.Netty4HttpServerTransport] [lmm-elasticsearch-ingest-0] publish_address {10.1.28.37:9200}, bound_addresses {[::]:9200}
    [2018-10-07T04:41:03,668][INFO ][o.e.n.Node ] [lmm-elasticsearch-ingest-0] started
    2018-10-07T04:41:03.741+0000: Total time for which application threads were stopped: 0.0004497 seconds, Stopping threads took: 0.0000773 seconds

(BIJAY) #10

CMS: abort preclean due to time 2018-10-07T04:41:08.572+0000: [CMS-concurrent-abortable-preclean: 4.819/5.573 secs] [Times: user=5.43 sys=0.31, real=5.57 secs]

2018-10-07T04:41:08.572+0000: [GC (CMS Final Remark) [YG occupancy: 2938810 K (6606080 K)]2018-10-07T04:41:08.572+0000: [Rescan (parallel) , 0.1699238 secs]2018-10-07T04:41:08.742+0000: [weak refs processing, 0.0002105 secs]2018-10-07T04:41:08.743+0000: [class unloading, 0.0124790 secs]2018-10-07T04:41:08.755+0000: [scrub symbol table, 0.0110952 secs]2018-10-07T04:41:08.766+0000: [scrub string table, 0.0005686 secs][1 CMS-remark: 0K(5242880K)] 2938810K(11848960K), 0.1959866 secs] [Times: user=0.46 sys=0.00, real=0.20 secs]

2018-10-07T04:41:08.768+0000: Total time for which application threads were stopped: 0.1963326 seconds, Stopping threads took: 0.0000853 seconds

2018-10-07T04:41:08.768+0000: [CMS-concurrent-sweep-start]

2018-10-07T04:41:08.769+0000: [CMS-concurrent-sweep: 0.000/0.000 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]

2018-10-07T04:41:08.769+0000: [CMS-concurrent-reset-start]

2018-10-07T04:41:08.780+0000: [CMS-concurrent-reset: 0.012/0.012 secs] [Times: user=0.03 sys=0.01, real=0.01 secs]

2018-10-07T04:41:09.769+0000: Total time for which application threads were stopped: 0.0003577 seconds, Stopping threads took: 0.0000663 seconds

[2018-10-07T04:41:42,759][INFO ][o.e.c.s.ClusterApplierService] [lmm-elasticsearch-ingest-0] removed {{lmm-elasticsearch-ingest-3}{XBAlQOrFQg6fu0XlGuhR9w}{EVViETOUR0yl9Yx5JDlM1g}{10.1.26.49}{10.1.26.49:9300}{ml.machine_memory=67371880448, ml.max_open_jobs=20, ml.enabled=true},}, reason: apply cluster state (from master [master {lmm-elasticsearch-master-1}{HLiDEs4mSyyLY5NVdrrfxg}{yEmQjtlbQfyLEcVZpGXHrw}{10.1.20.35}{10.1.20.35:9300}{ml.machine_memory=67371880448, ml.max_open_jobs=20, ml.enabled=true} committed version [964]])

2018-10-07T04:41:43.079+0000: Total time for which application threads were stopped: 0.0002978 seconds, Stopping threads took: 0.0000739 seconds

2018-10-07T04:41:44.080+0000: Total time for which application threads were stopped: 0.0003110 seconds, Stopping threads took: 0.0000743 seconds

2018-10-07T04:41:47.081+0000: Total time for which application threads were stopped: 0.0002617 seconds, Stopping threads took: 0.0000612 seconds

[2018-10-07T04:41:47,539][INFO ][o.e.c.s.ClusterApplierService] [lmm-elasticsearch-ingest-0] removed {{lmm-elasticsearch-ingest-1}{06606u0pQGub7kHC20O_UQ}{tTd3RxsnR82YtZm44Iz0dg}{10.1.16.37}{10.1.16.37:9300}{ml.machine_memory=67371880448, ml.max_open_jobs=20, ml.enabled=true},}, reason: apply cluster state (from master [master {lmm-elasticsearch-master-1}{HLiDEs4mSyyLY5NVdrrfxg}{yEmQjtlbQfyLEcVZpGXHrw}{10.1.20.35}{10.1.20.35:9300}{ml.machine_memory=67371880448, ml.max_open_jobs=20, ml.enabled=true} committed version [965]])

[2018-10-07T04:41:47,656][INFO ][o.e.c.s.ClusterApplierService] [lmm-elasticsearch-ingest-0] removed {{lmm-elasticsearch-ingest-2}{Z43OHma0TNGwW5aqkblf6A}{fmUDtTpXTP2Jf-6-2fPypQ}{10.1.44.96}{10.1.44.96:9300}{ml.machine_memory=67371880448, ml.max_open_jobs=20, ml.enabled=true},}, reason: apply cluster state (from master [master {lmm-elasticsearch-master-1}{HLiDEs4mSyyLY5NVdrrfxg}{yEmQjtlbQfyLEcVZpGXHrw}{10.1.20.35}{10.1.20.35:9300}{ml.machine_memory=67371880448, ml.max_open_jobs=20, ml.enabled=true} committed version [966]])

2018-10-07T04:41:48.081+0000: Total time for which application threads were stopped: 0.0003340 seconds, Stopping threads took: 0.0000859 seconds

2018-10-07T04:41:56.866+0000: Total time for which application threads were stopped: 0.0003300 seconds, Stopping threads took: 0.0000848 seconds

[2018-10-07T04:41:57,716][INFO ][o.e.c.s.ClusterApplierService] [lmm-elasticsearch-ingest-0] removed {{lmm-elasticsearch-ingest-4}{JcvKthKxS4mQeN_wpwFcFg}{kn96dlnpTeGN3xxLeWEHqQ}{10.1.42.44}{10.1.42.44:9300}{ml.machine_memory=67371880448, ml.max_open_jobs=20, ml.enabled=true},}, reason: apply cluster state (from master [master {lmm-elasticsearch-master-1}{HLiDEs4mSyyLY5NVdrrfxg}{yEmQjtlbQfyLEcVZpGXHrw}{10.1.20.35}{10.1.20.35:9300}{ml.machine_memory=67371880448, ml.max_open_jobs=20, ml.enabled=true} committed version [967]])

[2018-10-07T04:42:09,598][INFO ][o.e.c.s.ClusterApplierService] [lmm-elasticsearch-ingest-0] added {{lmm-elasticsearch-ingest-3}{XBAlQOrFQg6fu0XlGuhR9w}{qkL3YWaSSrG0Zumomzw86Q}{10.1.26.49}{10.1.26.49:9300}{ml.machine_memory=67371880448, ml.max_open_jobs=20, ml.enabled=true},}, reason: apply cluster state (from master [master {lmm-elasticsearch-master-1}{HLiDEs4mSyyLY5NVdrrfxg}{yEmQjtlbQfyLEcVZpGXHrw}{10.1.20.35}{10.1.20.35:9300}{ml.machine_memory=67371880448, ml.max_open_jobs=20, ml.enabled=true} committed version [968]])

[2018-10-07T04:42:12,535][INFO ][o.e.c.s.ClusterApplierService] [lmm-elasticsearch-ingest-0] added {{lmm-elasticsearch-ingest-2}{Z43OHma0TNGwW5aqkblf6A}{AvnOgt28TiGcdtbFq2liZg}{10.1.44.96}{10.1.44.96:9300}{ml.machine_memory=67371880448, ml.max_open_jobs=20, ml.enabled=true},}, reason: apply cluster state (from master [master {lmm-elasticsearch-master-1}{HLiDEs4mSyyLY5NVdrrfxg}{yEmQjtlbQfyLEcVZpGXHrw}{10.1.20.35}{10.1.20.35:9300}{ml.machine_memory=67371880448, ml.max_open_jobs=20, ml.enabled=true} committed version [969]])

2018-10-07T04:42:12.873+0000: Total time for which application threads were stopped: 0.0002776 seconds, Stopping threads took: 0.0000923 seconds

[2018-10-07T04:42:13,387][INFO ][o.e.c.s.ClusterApplierService] [lmm-elasticsearch-ingest-0] added {{lmm-elasticsearch-ingest-1}{06606u0pQGub7kHC20O_UQ}{QdOFoRHtTOKNQc2RecqGeg}{10.1.16.37}{10.1.16.37:9300}{ml.machine_memory=67371880448, ml.max_open_jobs=20, ml.enabled=true},}, reason: apply cluster state (from master [master {lmm-elasticsearch-master-1}{HLiDEs4mSyyLY5NVdrrfxg}{yEmQjtlbQfyLEcVZpGXHrw}{10.1.20.35}{10.1.20.35:9300}{ml.machine_memory=67371880448, ml.max_open_jobs=20, ml.enabled=true} committed version [970]])

[2018-10-07T04:42:23,167][INFO ][o.e.c.s.ClusterApplierService] [lmm-elasticsearch-ingest-0] added {{lmm-elasticsearch-ingest-4}{JcvKthKxS4mQeN_wpwFcFg}{yC_YE3l9Rg-afO1R1mnJvw}{10.1.42.44}{10.1.42.44:9300}{ml.machine_memory=67371880448, ml.max_open_jobs=20, ml.enabled=true},}, reason: apply cluster state (from master [master {lmm-elasticsearch-master-1}{HLiDEs4mSyyLY5NVdrrfxg}{yEmQjtlbQfyLEcVZpGXHrw}{10.1.20.35}{10.1.20.35:9300}{ml.machine_memory=67371880448, ml.max_open_jobs=20, ml.enabled=true} committed version [971]])

2018-10-07T04:42:23.876+0000: Total time for which application threads were stopped: 0.0003307 seconds, Stopping threads took: 0.0001009 seconds


(Daniel Mitterdorfer) #11

Hi,

Nothing in these logs points to an error. How do you determine that there is a problem? Is the node unreachable? Did you check the other things that I pointed you to (core dumps, heap dumps)?

Daniel


(BIJAY) #12

Hi Daniel,

The node crashes continuously after a certain timeframe (once in 4 hours) and I see there is core dump.

sh-4.2# ls -l /dev/core
lrwxrwxrwx. 1 root root 11 Oct 12 08:22 /dev/core -> /proc/kcore
sh-4.2# ls -l /proc/kcore
-r--------. 1 root root 140737486266368 Oct 12 08:59 /proc/kcore

Thanks
Bijay


(Daniel Mitterdorfer) #13

Hi,

this is not a core dump but rather the Linux kernel's core image (see kernel docs).

I think at this point we have solved the original question, namely what the output GC (Allocation Failure) failure means so I think in the interest of keeping the discussion focused on the topic at hand, I'd ask you to raise a new question about your periodic node failures. In order to get it resolved, I suggest to provide specific evidence in the form of logs that show how you have determined that the node is down and also provide a summary of your own analysis so far.

Daniel


(BIJAY) #14

This issue can be closed. I have removed the ingest part from the ELK.

Thanks a lot Daniel for your time.

Thanks
Bijay


(system) #15

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.