ElasticSearch suddenly stops

it works 3 or 4 days and then suddenly stop working :frowning:

here is the log

[2017-10-31T02:21:34,345][INFO ][o.e.c.r.a.DiskThresholdMonitor] [loMT4go] low disk watermark [85%] exceeded on [loMT4goJQc6ws1s0AgTVcg][loMT4go][/var/lib/elasticsearch/nodes/0] free: 24gb[12.7%], replicas will not be assigned to this node
[2017-10-31T02:23:59,680][WARN ][o.e.t.TransportService   ] [loMT4go] Received response for a request that has timed out, sent [152729ms] ago, timed out [117491ms] ago, action [cluster:monitor/nodes/stats[n]], node [{loMT4go}{loMT4goJQc6ws1s0AgTVcg}{Aeu_RR6pQlKk7dIBAYqjxw}{127.0.0.1}{127.0.0.1:9300}], id [173264]
[2017-10-31T02:28:19,560][INFO ][o.e.c.r.a.DiskThresholdMonitor] [loMT4go] low disk watermark [85%] exceeded on [loMT4goJQc6ws1s0AgTVcg][loMT4go][/var/lib/elasticsearch/nodes/0] free: 24gb[12.7%], replicas will not be assigned to this node
[2017-10-31T02:26:44,684][DEBUG][o.e.a.a.c.n.s.TransportNodesStatsAction] [loMT4go] failed to execute on node [loMT4goJQc6ws1s0AgTVcg]
org.elasticsearch.transport.ReceiveTimeoutTransportException: [loMT4go][127.0.0.1:9300][cluster:monitor/nodes/stats[n]] request_id [173264] timed out after [35238ms]
        at org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:934) [elasticsearch-5.4.3.jar:5.4.3]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:569) [elasticsearch-5.4.3.jar:5.4.3]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_101]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_101]
        at java.lang.Thread.run(Thread.java:745) [?:1.8.0_101]
[2017-10-31T02:33:02,407][INFO ][o.e.c.r.a.DiskThresholdMonitor] [loMT4go] low disk watermark [85%] exceeded on [loMT4goJQc6ws1s0AgTVcg][loMT4go][/var/lib/elasticsearch/nodes/0] free: 24gb[12.7%], replicas will not be assigned to this node
[2017-10-31T02:33:29,505][WARN ][o.e.t.TransportService   ] [loMT4go] Received response for a request that has timed out, sent [102711ms] ago, timed out [78543ms] ago, action [cluster:monitor/nodes/stats[n]], node [{loMT4go}{loMT4goJQc6ws1s0AgTVcg}{Aeu_RR6pQlKk7dIBAYqjxw}{127.0.0.1}{127.0.0.1:9300}], id [173268]
[2017-10-31T02:32:13,332][DEBUG][o.e.a.a.c.n.s.TransportNodesStatsAction] [loMT4go] failed to execute on node [loMT4goJQc6ws1s0AgTVcg]
org.elasticsearch.transport.ReceiveTimeoutTransportException: [loMT4go][127.0.0.1:9300][cluster:monitor/nodes/stats[n]] request_id [173268] timed out after [24168ms]
        at org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:934) [elasticsearch-5.4.3.jar:5.4.3]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:569) [elasticsearch-5.4.3.jar:5.4.3]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_101]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_101]
        at java.lang.Thread.run(Thread.java:745) [?:1.8.0_101]
.
.
. 
[2017-10-31T02:41:13,368][WARN ][o.e.a.a.c.n.s.TransportNodesStatsAction] [loMT4go] not accumulating exceptions, excluding exception from response
org.elasticsearch.action.FailedNodeException: Failed node [loMT4goJQc6ws1s0AgTVcg]
        at org.elasticsearch.action.support.nodes.TransportNodesAction$AsyncAction.onFailure(TransportNodesAction.java:246) [elasticsearch-5.4.3.jar:5.4.3]
        at org.elasticsearch.action.support.nodes.TransportNodesAction$AsyncAction.access$200(TransportNodesAction.java:160) [elasticsearch-5.4.3.jar:5.4.3]
        at org.elasticsearch.action.support.nodes.TransportNodesAction$AsyncAction$1.handleException(TransportNodesAction.java:218) [elasticsearch-5.4.3.jar:5.4.3]
        at org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleException(TransportService.java:1050) [elasticsearch-5.4.3.jar:5.4.3]
        at org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:933) [elasticsearch-5.4.3.jar:5.4.3]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:569) [elasticsearch-5.4.3.jar:5.4.3]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_101]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_101]
        at java.lang.Thread.run(Thread.java:745) [?:1.8.0_101]
Caused by: org.elasticsearch.transport.ReceiveTimeoutTransportException: [loMT4go][127.0.0.1:9300][cluster:monitor/nodes/stats[n]] request_id [173270] timed out after [38400ms]
        at org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:934) ~[elasticsearch-5.4.3.jar:5.4.3]
        ... 4 more

I'm new to that stuff please help

the system doesn't swap or has to less memory

free -h
              total        used        free      shared  buff/cache   available
Mem:           7.8G        4.9G        256M         91M        2.7G        2.5G
Swap:            0B          0B          0B

df -h
Filesystem      Size  Used Avail Use% Mounted on
udev            3.9G     0  3.9G   0% /dev
tmpfs           799M   83M  716M  11% /run
/dev/sda1       188G  154G   25G  87% /
tmpfs           3.9G     0  3.9G   0% /dev/shm
tmpfs           5.0M     0  5.0M   0% /run/lock
tmpfs           3.9G     0  3.9G   0% /sys/fs/cgroup
tmpfs           799M     0  799M   0% /run/user/0

uname -a
Linux amorebio 4.4.0-97-generic #120-Ubuntu SMP Tue Sep 19 17:28:18 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

java -version
java version "1.8.0_101"
Java(TM) SE Runtime Environment (build 1.8.0_101-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.101-b13, mixed mode)

thank you
mike

It looks like you are running low on disk space, which is preventing Elasticsearch operations.

Hello, I have similar problems, since a few days, the service elasticsearch stop
"elasticsearch dead but pid file exists"
I enable DEBUG, and the last lines are about
[2017-11-28 12:02:15,395][DEBUG][indices.ttl ] [Loki] [idirect][3] purging shard
[2017-11-28 12:02:15,396][DEBUG][indices.ttl ] [Loki] [idirect][4] purging shard

any log else to check ? I dont know why it stop.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.