Hi Team,
Because of some other issue, the disk space on my server reached full.
And I got below error on elasticsearch.log:
[2018-06-27T17:37:15,983][WARN ][o.e.c.r.a.DiskThresholdMonitor] [l49aWt5] flood stage disk watermark [5gb] exceeded on [l49aWt56RrWuszAP0tiU7w][l49aWt5][/var/lib/elasticsearch/nodes/0] free: 3.5gb[23.4%], all indices on this node will marked read-only
[2018-06-27T17:38:55,624][INFO ][o.e.n.Node ] [l49aWt5] stopping ...
[2018-06-27T17:38:55,654][INFO ][o.e.n.Node ] [l49aWt5] stopped
[2018-06-27T17:38:55,654][INFO ][o.e.n.Node ] [l49aWt5] closing ...
[2018-06-27T17:38:55,664][INFO ][o.e.n.Node ] [l49aWt5] closed
Now, today I cleaned up and increased disk space with the help of my network team.
I am now unable to restart elasticsearch now.
It is giving below error in gc.log.0.current:
OpenJDK 64-Bit Server VM (25.161-b14) for linux-amd64 JRE (1.8.0_161-b14), built on Jan 17 2018 16:35:30 by "mockbuild" with gcc 4.8.5 20150623 (Red Hat 4.8.5-16)
Memory: 4k page, physical 16259528k(1444848k free), swap 3907580k(3527968k free)
CommandLine flags: -XX:+AlwaysPreTouch -XX:CMSInitiatingOccupancyFraction=75 -XX:GCLogFileSize=67108864 -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/var/lib/elasticsearch -XX:InitialHeapSize=1073741824 -XX:MaxHeapSize=1073741824 -XX:MaxNewSize=348966912 -XX:MaxTenuringThreshold=6 -XX:NewSize=348966912 -XX:NumberOfGCLogFiles=32 -XX:OldPLABSize=16 -XX:OldSize=697933824 -XX:-OmitStackTraceInFastThrow -XX:+PrintGC -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintTenuringDistribution -XX:ThreadStackSize=1024 -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseCompressedClassPointers -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC -XX:+UseGCLogFileRotation -XX:+UseParNewGC
2018-10-29T15:15:51.750+0100: 0.638: Total time for which application threads were stopped: 0.0001216 seconds, Stopping threads took: 0.0000400 seconds
.
.
Desired survivor size 17432576 bytes, new threshold 6 (max 6)
- age 1: 16036816 bytes, 16036816 total
.
.
Desired survivor size 17432576 bytes, new threshold 1 (max 6)
- age 1: 18658672 bytes, 18658672 total
- age 2: 3846248 bytes, 22504920 total
.
.
2018-10-29T15:15:55.327+0100: 4.214: [CMS-concurrent-abortable-preclean: 0.976/1.112 secs] [Times: user=3.93 sys=0.06, real=1.12 secs]
2018-10-29T15:15:55.328+0100: 4.215: [GC (CMS Final Remark) [YG occupancy: 160118 K (306688 K)]2018-10-29T15:15:55.328+0100: 4.215: [Rescan (parallel) , 0.0286394 secs]2018-10-29T15:15:55.357+0100: 4.244: [weak refs processing, 0.0001317 secs]2018-10-29T15:15:55.357+0100: 4.244: [class unloading, 0.0049511 secs]2018-10-29T15:15:55.362+0100: 4.249: [scrub symbol table, 0.0056587 secs]2018-10-29T15:15:55.368+0100: 4.255: [scrub string table, 0.0003497 secs][1 CMS-remark: 0K(707840K)] 160118K(1014528K), 0.0405616 secs] [Times: user=0.10 sys=0.00, real=0.04 secs]
2018-10-29T15:15:55.369+0100: 4.256: Total time for which application threads were stopped: 0.0406876 seconds, Stopping threads took: 0.0000397 seconds
2018-10-29T15:15:55.369+0100: 4.256: [CMS-concurrent-sweep-start]
2018-10-29T15:15:55.369+0100: 4.256: [CMS-concurrent-sweep: 0.000/0.000 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
2018-10-29T15:15:55.369+0100: 4.256: [CMS-concurrent-reset-start]
2018-10-29T15:15:55.370+0100: 4.257: [CMS-concurrent-reset: 0.001/0.001 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
2018-10-29T15:15:55.615+0100: 4.502: Total time for which application threads were stopped: 0.0002267 seconds, Stopping threads took: 0.0000466 seconds
2018-10-29T15:15:55.868+0100: 4.756: Total time for which application threads were stopped: 0.0005062 seconds, Stopping threads took: 0.0000480 seconds
2018-10-29T15:15:56.061+0100: 4.949: Total time for which application threads were stopped: 0.0004933 seconds, Stopping threads took: 0.0000363 seconds
2018-10-29T15:15:56.196+0100: 5.083: Total time for which application threads were stopped: 0.0006007 seconds, Stopping threads took: 0.0001575 seconds
Heap
par new generation total 306688K, used 276489K [0x00000000c0000000, 0x00000000d4cc0000, 0x00000000d4cc0000)
eden space 272640K, 92% used [0x00000000c0000000, 0x00000000cf72a618, 0x00000000d0a40000)
from space 34048K, 68% used [0x00000000d0a40000, 0x00000000d21181c0, 0x00000000d2b80000)
to space 34048K, 0% used [0x00000000d2b80000, 0x00000000d2b80000, 0x00000000d4cc0000)
concurrent mark-sweep generation total 707840K, used 0K [0x00000000d4cc0000, 0x0000000100000000, 0x0000000100000000)
Metaspace used 43831K, capacity 47042K, committed 47172K, reserved 1089536K
class space used 5815K, capacity 6970K, committed 6996K, reserved 1048576K
When I start elasticsearch service, it goes down with below error:
[root@xxx elasticsearch]# systemctl status elasticsearch -l
● elasticsearch.service - Elasticsearch
Loaded: loaded (/usr/lib/systemd/system/elasticsearch.service; disabled; vendor preset: disabled)
Active: failed (Result: exit-code) since Mon 2018-10-29 15:49:35 CET; 4h 30min ago
Docs: http://www.elastic.co
Process: 6445 ExecStart=/usr/share/elasticsearch/bin/elasticsearch -p ${PID_DIR}/elasticsearch.pid --quiet (code=exited, status=1/FAILURE)
Main PID: 6445 (code=exited, status=1/FAILURE)
Oct 29 15:49:29 xxx systemd[1]: Started Elasticsearch.
Oct 29 15:49:29 xxx systemd[1]: Starting Elasticsearch...
Oct 29 15:49:30 xxx elasticsearch[6445]: 2018-10-29 15:49:30,532 main ERROR Invalid status level specified: ERROR# LOG ACTION EXECUTION ERRORS FOR EASIER DEBUGGING. Defaulting to ERROR.
Oct 29 15:49:35 xxx systemd[1]: elasticsearch.service: main process exited, code=exited, status=1/FAILURE
Oct 29 15:49:35 xxx systemd[1]: Unit elasticsearch.service entered failed state.
Oct 29 15:49:35 xxx systemd[1]: elasticsearch.service failed.
Please help me to resolve this issue.