Elasticsearch Net, "all shards failed" when one node is shut down, 7.11.1

mbooh · August 17, 2021, 3:14pm

Hi!

I have a three nodes cluster setup on version 7.11.1, with this configuration file:

bootstrap.memory_lock: false
cluster.initial_master_nodes:
  - STHLM-04
  - STHLM-05
  - STHLM-06
cluster.name: ELASTIC-TEST
discovery.seed_hosts:
  - STHLM-04
  - STHLM-05
  - STHLM-06
http.port: 9200
network.host: STHLM-04
node.max_local_storage_nodes: 1
node.name: STHLM-04
node.roles: [ master, data, ingest ]
path.data: D:\Elastic\ElasticSearch\Data
path.logs: D:\Elastic\ElasticSearch\Logs
transport.tcp.port: 9300
xpack.license.self_generated.type: basic
xpack.security.enabled: false

The other nodes config is setup the same except for node.name and netework.host.

Then I have a .net project running Elasticsearch Net and the connection is setup like this:

var nodeUrls = nodeUrl.Split(',');
var nodeUris = nodeUrls.Select(n => new Uri(n));

var nodes = nodeUris.Select(u => new Node(u));
var connectionPool = new StickyConnectionPool(nodes);

var settings = new ConnectionSettings(connectionPool)
    .DisableDirectStreaming()
    .BasicAuthentication(username, password);

var client = new CustomElasticClient(settings);

As long as both nodes are up and running it works just perfect but if I shut down one of the Elasticsearch services I get the "all shards failed" error.

[2021-08-17T15:54:26,100][WARN ][r.suppressed             ] [STHLM-04] path: /webutv2_1/_search, params: {typed_keys=true, index=webutv2_1}
org.elasticsearch.action.search.SearchPhaseExecutionException: all shards failed
	at org.elasticsearch.action.search.AbstractSearchAsyncAction.onPhaseFailure(AbstractSearchAsyncAction.java:601) [elasticsearch-7.11.1.jar:7.11.1]
	at org.elasticsearch.action.search.AbstractSearchAsyncAction.executeNextPhase(AbstractSearchAsyncAction.java:332) [elasticsearch-7.11.1.jar:7.11.1]
	at org.elasticsearch.action.search.AbstractSearchAsyncAction.onPhaseDone(AbstractSearchAsyncAction.java:636) [elasticsearch-7.11.1.jar:7.11.1]
	at org.elasticsearch.action.search.AbstractSearchAsyncAction.onShardFailure(AbstractSearchAsyncAction.java:415) [elasticsearch-7.11.1.jar:7.11.1]
	at org.elasticsearch.action.search.AbstractSearchAsyncAction.lambda$performPhaseOnShard$0(AbstractSearchAsyncAction.java:240) [elasticsearch-7.11.1.jar:7.11.1]
	at org.elasticsearch.action.search.AbstractSearchAsyncAction$2.doRun(AbstractSearchAsyncAction.java:308) [elasticsearch-7.11.1.jar:7.11.1]
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26) [elasticsearch-7.11.1.jar:7.11.1]
	at org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:33) [elasticsearch-7.11.1.jar:7.11.1]
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:732) [elasticsearch-7.11.1.jar:7.11.1]
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26) [elasticsearch-7.11.1.jar:7.11.1]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630) [?:?]
	at java.lang.Thread.run(Thread.java:832) [?:?]
Caused by: org.elasticsearch.action.NoShardAvailableActionException
	at org.elasticsearch.action.search.AbstractSearchAsyncAction.onShardFailure(AbstractSearchAsyncAction.java:448) ~[elasticsearch-7.11.1.jar:7.11.1]
	at org.elasticsearch.action.search.AbstractSearchAsyncAction.onShardFailure(AbstractSearchAsyncAction.java:397) [elasticsearch-7.11.1.jar:7.11.1]
	... 9 more

What am I missing? How do I get Elastic to run without error even if one node shuts down?
At this point I cannot upgrade without downtime for example which is something you should be able to do I guess.

Thanks!

/Kristoffer

leandrojmp · August 17, 2021, 3:25pm

Does your index have replicas?

Can you make the following request to check the index webutv2_1 ?

GET _cat/indices/webutv2_1?v&s=i

If your indices does not have replicas, you will get this message if a node with a shard of this index goes down, to solve this you need replicas.

mbooh · August 18, 2021, 7:21am

Hi!
It says rep = 0 so I guess there are no replicas.
I'm trying to find how to add replicas to an existing index but I can't find anything, could guide the the right way? Should it be setup in the elasticsearch.yml?

One other thing, when I run GET _cat/indices I find webutv2_1 and webutv2_2?
When pushing data using Elasticsearch Net I have defined the index name to just webutv2.

Thanks!

/Kristoffer

Christian_Dahlqvist · August 18, 2021, 7:39am

It is controlled through index settings that can be updated. If you want to change the default for new indices you need to have se an index template.

mbooh · August 18, 2021, 8:13am

Ok, and to get rid of the "shards" error I need to set it at least one or do I need to set it to two?

Christian_Dahlqvist · August 18, 2021, 8:14am

One should be sufficient. If you need additional resiliency you can however increase it.

mbooh · August 18, 2021, 9:02am

Getting there!
After updating the replicas to 1 I can shut down STHLM-05 or STHLM-06 and everything works just fine.
However if I shutdown STHLM-04 and try to run:

GET /_cat/nodes

from Dev Tools I get:
{"statusCode":503,"error":"Service Unavailable","message":"License is not available."}

I guess there is some other configuration error somewhere or why must the 04 always be running? They are all configured the same.

Thanks!

/Kristoffer

mbooh · August 18, 2021, 10:19am

Forget this one... Kibana is installed on STHLM-04 and will of course not work properly if I shut down 04...

system · September 15, 2021, 10:19am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
All shards failed after one node shutdown Elasticsearch	2	1283	January 19, 2018
All shards failed across multiple indexes Elasticsearch	6	519	June 2, 2022
Elasticsearch All Shards failed on cluster with multiple nodes on azure VM Elasticsearch	7	627	August 19, 2020
Recover shard failed Elasticsearch	1	1561	November 16, 2017
All shards failed for phase: [query] on Elasticsearch 7.2.0 Elasticsearch	5	40215	August 29, 2019

Elasticsearch Net, "all shards failed" when one node is shut down, 7.11.1

Related topics