[BUG] A replica shard in POST_RECOVERY state, when promoted to primary, will be stuck

EDIT: I think I mixed up ShardRoutingState and IndexShardState.

I think this introduced a bug - A replica can be promoted and started in one cluster state update by bleskes · Pull Request #32042 · elastic/elasticsearch · GitHub

This commit maybe misses that currentRouting.initializing() does not include the IndexShardState.POST_RECOVERY. It was intended to fix this kind of a bug for more general cases like this but looks like this case might have been dropped in the refactor.

Steps to reproduce

(Non-deterministic)

After a restart of a 4 node cluster, (the index of interest in it is a single shard index with replicationFactor = 2)
We saw all replicas of that single shard stuck in "INITIALIZING", while the primary shard had "STARTED" state.

Indexing kept failing with

{"type":"retry_on_primary_exception","reason":"shard is not in primary mode","index":"***","shard":"0","index_uuid":"***","caused_by":{"type":"shard_not_in_primary_mode_exception","reason":"CurrentState[STARTED] shard is not in primary mode","index":"***","shard":"0","index_uuid":"***"}

It appeared like an indexShard's shardRouting.primary was set to true, but replicationTracker.primary was not. Tracing code, this looks like the bug?

Version - opensearch 2.19.1 (this code path is still in elasticsearch master too)

OpenSearch/OpenDistro are AWS run products and differ from the original Elasticsearch and Kibana products that Elastic builds and maintains. You may need to contact them directly for further assistance. See What is OpenSearch and the OpenSearch Dashboard? | Elastic for more details.

(This is an automated response from your friendly Elastic bot. Please report this post if you have any suggestions or concerns :elasticheart: )