ShardNotInPrimaryModeException when indexing to a cluster while shards move

zorah · September 17, 2020, 7:28am

When I add a new node to my Elasticsearch cluster, the cluster will rebalance shards and shards will move onto the new node. When I want to remove a node from the cluster, I will force shards off the node before decommissioning it. During both of these operations where shards are moving, index requests are still being made to the cluster. Occasionally some shards will take quite a long time to move (say, ~30-45 minutes) and indexing requests during this time will be rejected with a ShardNotInPrimaryModeException that looks like the following

java.lang.Exception: RemoteTransportException[[{NODE_NAME}][{NODE_IP}:9300][indices:data/write/bulk[s]]]; nested: RemoteTransportException[[{NODE_NAME}][{NODE_IP}:9300][indices:data/write/bulk[s][p]]]; nested: RetryOnPrimaryException[shard is not in primary mode]; nested: ShardNotInPrimaryModeException[CurrentState[STARTED] shard is not in primary mode];

From the exception name, this implies that Elasticsearch is attempting to index to a shard while it is not considered the primary. Perhaps when a primary shard is being relocating its primary status changes?

I haven't found a lot of documentation or discussion about this online. I briefly skimmed the ES source code where this exception gets thrown, and, similar to what the name implies, it appears there is some internal state that Elasticsearch has that is at odds with the indexing operation that is attempting to take place.

Is this due to an error on my end (i.e.client error) or an error on the ES side (e.g. some flavor of an IllegalStateException)? Are there operations that I should be taking to avoid or mitigate these errors from occurring? I'm also a bit surprised that it takes so long for some of these shards to move. The shards in the cluster vary in size, but the culprits seem to be around 30gb, which is within the target size for a shard

For reference, I'm running ES 7.6.2

Thanks in advance!

system · October 15, 2020, 7:28am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How does elasticsearch move a primary shard? Elasticsearch	10	4907	January 18, 2019
Primary shard is not active or isn't assigned to a known node Elasticsearch	20	38693	November 30, 2017
Moving shards is slow Elasticsearch	15	5263	May 10, 2018
RetryOnPrimaryException in ES node Elasticsearch	4	764	July 5, 2017
Force shard reallocation Elasticsearch	8	8204	July 5, 2017

ShardNotInPrimaryModeException when indexing to a cluster while shards move

Related Topics