I have 2 elasticsearch clusters deployed in 2 different regions in Amazon EKS.
replicating data from east1 to east2 using CCR. I only keep the indices in east2 (dest cluster) for 2 days. 0 day after roller to warm (also, forcemerge and readonly) and then delete in 1 day as per ILM for all the indices. only 3 hot and 3 warm nodes in the DR cluster. total number of shards per node for every index is 4. and most of the indices have only 6 primary and 1 replica (total of 12 shards).
But I see indices from say 5 day or even 4 days back stuck in forcemerge, I used to have lot more older, recently did a cleanup and started observing.
I cant seem to find any log that actually says any error on force merge.
What should I be looking for here.
The only changes I did and observed the indices are deleted later are:
I increased the number of shards per node on a single index to 10 from 4/6. and then manually triggered forcemerge, POST //_forcemerge. THe next day I dont see the index as it was supposed to be deleted long back.
Would like to identify the root cause of why the indices are getting stuck and what's the right solution to resolve the issue.
At this stage, I am not sure if just updating the number of shards or manually triggering the forcemerge after the shaard count change is the solution.
Is there any way to find the log/reason why an index is stuck in forcemerge ?