Elasticsearch stop working suddenly

Hi,
Elasticsearch stop working suddenly. Why is stopping services? I get below log. How can ı fix bellow problem? I use -Xmx10g so 10 GB ram use it.

[2019-05-02T10:11:44,459][WARN ][r.suppressed             ] [MasterNode128] path: /.kibana/_search, params: {rest_total_hits_as_int=true, size=1000, index=.kibana, from=0}
org.elasticsearch.action.search.SearchPhaseExecutionException: all shards failed
	at org.elasticsearch.action.search.AbstractSearchAsyncAction.onPhaseFailure(AbstractSearchAsyncAction.java:293) ~[elasticsearch-6.6.1.jar:6.6.1]
	at org.elasticsearch.action.search.AbstractSearchAsyncAction.executeNextPhase(AbstractSearchAsyncAction.java:133) ~[elasticsearch-6.6.1.jar:6.6.1]
	at org.elasticsearch.action.search.AbstractSearchAsyncAction.onPhaseDone(AbstractSearchAsyncAction.java:254) ~[elasticsearch-6.6.1.jar:6.6.1]
	at org.elasticsearch.action.search.InitialSearchPhase.onShardFailure(InitialSearchPhase.java:101) ~[elasticsearch-6.6.1.jar:6.6.1]
	at org.elasticsearch.action.search.InitialSearchPhase.lambda$performPhaseOnShard$1(InitialSearchPhase.java:209) ~[elasticsearch-6.6.1.jar:6.6.1]
	at org.elasticsearch.action.search.InitialSearchPhase$1.doRun(InitialSearchPhase.java:188) [elasticsearch-6.6.1.jar:6.6.1]
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:759) [elasticsearch-6.6.1.jar:6.6.1]
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-6.6.1.jar:6.6.1]
	at org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:41) [elasticsearch-6.6.1.jar:6.6.1]
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-6.6.1.jar:6.6.1]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) [?:1.8.0_191]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) [?:1.8.0_191]
	at java.lang.Thread.run(Unknown Source) [?:1.8.0_191]
[2019-05-02T10:11:46,194][WARN ][r.suppressed             ] [MasterNode128] path: /.kibana/doc/space%3Adefault, params: {index=.kibana, id=space:default, type=doc}
org.elasticsearch.action.NoShardAvailableActionException: No shard available for [get [.kibana][doc][space:default]: routing [null]]
	at org.elasticsearch.action.support.single.shard.TransportSingleShardAction$AsyncSingleAction.perform(TransportSingleShardAction.java:230) ~[elasticsearch-6.6.1.jar:6.6.1]
	at org.elasticsearch.action.support.single.shard.TransportSingleShardAction$AsyncSingleAction.start(TransportSingleShardAction.java:209) ~[elasticsearch-6.6.1.jar:6.6.1]
	at org.elasticsearch.action.support.single.shard.TransportSingleShardAction.doExecute(TransportSingleShardAction.java:100) ~[elasticsearch-6.6.1.jar:6.6.1]
	at org.elasticsearch.action.support.single.shard.TransportSingleShardAction.doExecute(TransportSingleShardAction.java:62) ~[elasticsearch-6.6.1.jar:6.6.1]
	at org.elasticsearch.action.support.TransportAction.doExecute(TransportAction.java:143) ~[elasticsearch-6.6.1.jar:6.6.1]
	at org.elasticsearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:167) ~[elasticsearch-6.6.1.jar:6.6.1]
	at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:139) ~[elasticsearch-6.6.1.jar:6.6.1]
	at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:81) ~[elasticsearch-6.6.1.jar:6.6.1]
	at org.elasticsearch.client.node.NodeClient.executeLocally(NodeClient.java:87) ~[elasticsearch-6.6.1.jar:6.6.1]
	at org.elasticsearch.client.node.NodeClient.doExecute(NodeClient.java:76) ~[elasticsearch-6.6.1.jar:6.6.1]
	at org.elasticsearch.client.support.AbstractClient.execute(AbstractClient.java:403) ~[elasticsearch-6.6.1.jar:6.6.1]
	at org.elasticsearch.client.support.AbstractClient.get(AbstractClient.java:502) ~[elasticsearch-6.6.1.jar:6.6.1]
	at org.elasticsearch.rest.action.document.RestGetAction.lambda$prepareRequest$0(RestGetAction.java:81) ~[elasticsearch-6.6.1.jar:6.6.1]
	at org.elasticsearch.rest.BaseRestHandler.handleRequest(BaseRestHandler.java:97) [elasticsearch-6.6.1.jar:6.6.1]
	at org.elasticsearch.rest.RestController.dispatchRequest(RestController.java:240) [elasticsearch-6.6.1.jar:6.6.1]
	at org.elasticsearch.rest.RestController.tryAllHandlers(RestController.java:336) [elasticsearch-6.6.1.jar:6.6.1]
	at org.elasticsearch.rest.RestController.dispatchRequest(RestController.java:174) [elasticsearch-6.6.1.jar:6.6.1]
	at org.elasticsearch.http.netty4.Netty4HttpServerTransport.dispatchRequest(Netty4HttpServerTransport.java:551) [transport-netty4-client-6.6.1.jar:6.6.1]
	at org.elasticsearch.http.netty4.Netty4HttpRequestHandler.channelRead0(Netty4HttpRequestHandler.java:137) [transport-netty4-client-6.6.1.jar:6.6.1]

[2019-05-02T10:11:47,710][INFO ][o.e.c.r.a.AllocationService] [MasterNode128] Cluster health status changed from [RED] to [YELLOW] (reason: [shards started [[.kibana_1][0]] ...]).

Hi @khergner,

the log snippet does not contain clues as to why it stopped. Also, the symptom you see and topology of your setup is not entirely clear to me:

  1. Is this a clustered setup or a single server dev setup? Which node is the log file from?
  2. Did the process stop? The entire cluster?
  3. Is there an error message returned from elasticsearch?
  4. Is the problem reproducible? If so, please describe the steps to reproduce.

Hi Henning,
Firstly, Thank you for reply.

  1. Actually single server setup but Server Tasks data collector then push to Kibana visuallize
    Arch pic;
    Untitled%20Diagram

  2. Yes, The proccess stoped suddenly

  3. No, ı have't error message

  4. Yes, First services stop, Next start so ı get error message in log file
    Exm.

     [2019-05-02T15:08:14,072][WARN ][r.suppressed             ] [MasterNode128] path: /.kibana/_search, params: {size=1000, ignore_unavailable=true, index=.kibana, filter_path=hits.hits._id}    org.elasticsearch.cluster.block.ClusterBlockException: blocked by: [SERVICE_UNAVAILABLE/1/state not recovered / initialized];
     at org.elasticsearch.cluster.block.ClusterBlocks.globalBlockedException(ClusterBlocks.java:166) ~[elasticsearch-6.6.1.jar:6.6.1]
     at org.elasticsearch.cluster.block.ClusterBlocks.globalBlockedRaiseException(ClusterBlocks.java:152) ~[elasticsearch-6.6.1.jar:6.6.1]
     at org.elasticsearch.action.search.TransportSearchAction.executeSearch(TransportSearchAction.java:297) ~[elasticsearch-6.6.1.jar:6.6.1]
     at org.elasticsearch.action.search.TransportSearchAction.lambda$doExecute$4(TransportSearchAction.java:193) ~[elasticsearch-6.6.1.jar:6.6.1]
     at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:60) ~[elasticsearch-6.6.1.jar:6.6.1]
     at org.elasticsearch.index.query.Rewriteable.rewriteAndFetch(Rewriteable.java:114) ~[elasticsearch-6.6.1.jar:6.6.1]
     at org.elasticsearch.index.query.Rewriteable.rewriteAndFetch(Rewriteable.java:87) ~[elasticsearch-6.6.1.jar:6.6.1]
     at org.elasticsearch.action.search.TransportSearchAction.doExecute(TransportSearchAction.java:215) ~[elasticsearch-6.6.1.jar:6.6.1]
     at org.elasticsearch.action.search.TransportSearchAction.doExecute(TransportSearchAction.java:68) ~[elasticsearch-6.6.1.jar:6.6.1]
     at org.elasticsearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:167) ~[elasticsearch-6.6.1.jar:6.6.1]
     at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:139) ~[elasticsearch-6.6.1.jar:6.6.1]
     at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:81) ~[elasticsearch-6.6.1.jar:6.6.1]
     at org.elasticsearch.client.node.NodeClient.executeLocally(NodeClient.java:87) ~[elasticsearch-6.6.1.jar:6.6.1]
     at org.elasticsearch.client.node.NodeClient.doExecute(NodeClient.java:76) ~[elasticsearch-6.6.1.jar:6.6.1]
     at org.elasticsearch.client.support.AbstractClient.execute(AbstractClient.java:403) ~[elasticsearch-6.6.1.jar:6.6.1]
     at org.elasticsearch.client.support.AbstractClient.search(AbstractClient.java:537) ~[elasticsearch-6.6.1.jar:6.6.1]
     at org.elasticsearch.rest.action.search.RestSearchAction.lambda$prepareRequest$2(RestSearchAction.java:100) ~[elasticsearch-6.6.1.jar:6.6.1]
     at org.elasticsearch.rest.BaseRestHandler.handleRequest(BaseRestHandler.java:97) [elasticsearch-6.6.1.jar:6.6.1]
     at org.elasticsearch.rest.RestController.dispatchRequest(RestController.java:240) [elasticsearch-6.6.1.jar:6.6.1]
     at org.elasticsearch.rest.RestController.tryAllHandlers(RestController.java:336) [elasticsearch-6.6.1.jar:6.6.1]
     at org.elasticsearch.rest.RestController.dispatchRequest(RestController.java:174) [elasticsearch-6.6.1.jar:6.6.1]

Hi @khergner,

it looks like either we could not read the node state from disk or the node thinks it is part of a cluster and is waiting for other nodes to join. It may be possible to see more from the complete log file.

Hi Hennig
Thank you for reply. I think so too. Elasticsearch reads from shared network folder. I got dump for Elasticsearch. I share below link

Hi @khergner,

does the log file cover a full server run, from starting it until it crashed? Looks like the server was started, then took 2-3 minutes to find the shards on disk and finally were up and running OK. Until all shards are recovered, the errors from your previous posts are to be expected, the cluster is in RED state until at least one copy of every shard is recovered.

Notice that storing the data folder on a network share is not recommended. I think the startup time could be explained by that and it would be good to try out the same experiment on a local folder on the server.

If that log file includes the part where the elasticsearch java process stops, we are likely looking at an external factor killing it. Is this running on linux or windows? Do you have enough RAM to have a 10GB heap (would require at least 16GB, preferably more)?

Hi Hennig
Firstly Thank you for reply. Can ı have a question?
Now to be architectural

  • 8 node (metricbeat and filebeat "apache and nginx" ) + 1 Management server (DATA, Ingest, Master) runnig

  • Data mounth= 100 GB per daily

How should be management server physical hardware?

I use now
Processor: Intel Xeon Cpu x5650 @ 2.67 GHZ and 2.66 Ghz
RAM: 24.0 GB
Server: Windows 2012
Disk: 1 TB

Finally;
if you have documents that look like this. Can you share me?
Regards

Hi @khergner,

sizing is not an exact science and coming up with an answer to this will require experimentation. I recommend reading blogs on this, for instance this one:

The definitive guide to elasticsearch also has relevant content on this:

https://www.elastic.co/guide/en/elasticsearch/guide/current/scale.html

If you need high availability, you should have at least 3 nodes.

I hope your original issue was resolved, if not, please provide additional details.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.