Restart node after 15 mins

Hi,

We want to put the node offline for 15 mins and then bring back into the cluster. So the approach we are following is mentioned below

  1. disable shard allocation except for primaries
  2. Perform flush
  3. Shutdown Elasticsearch service and node
  4. Perform node operations (for 15 mins)
  5. Restart node and Elasticsearch service
  6. Reenable shard allocation to "all" when node joins cluster

Cluster Information

  1. Current ES Version - 8.12.2
  2. Node Data Size - 2.7 TB
  3. No of shards per node - 93
  4. Active Indexing and searching will be happening on the cluster

My understanding is that when we disable shard allocations except primaries, there will be unassigned shards as replica is staying on the node which is getting restarted. When node comes back online, cluster will turn to Green and those replica shards will be available. The node will have only replica shards.

So I have few questions

  1. What is the time in which shard on a node become stale?
  2. I believe that data movement will happen on the restarted node to make replicas in sync with primary due to active indexing. Please confirm.
  3. Is the recovery duration function of data size? How to evaluate the time for unassigned shards to be allocated and data movement?
  4. Is it suggested to set cluster routing to "new_primaries" considering node will be offline for 15 mins and there will be active indexing?
  1. Marking a shard as stale is not a time-based thing. It happens as soon as the primary processes an operation that isn't on the replica.

  2. Confirmed.

  3. Mostly it's determined by only the size of the data changed while the node was down. So if there were no writes, the recovery should be very fast, whereas if you wrote GiBs of data then it will take longer.

  4. Yes, that's part of the documented process.