Sure. It means that Elasticsearch tried to replicate an operation on a shard (e.g. indexing a document) but one of the replicas was unavailable at the time, and the replica previously had been available and in-sync, so it has to be removed from the in-sync set and marked as stale. It normally happens if a node drops off the cluster, but the cluster hasn't reassigned all its shards elsewhere yet, and then you try and write to one of the missing shards. Being a WARN-level log this isn't expected to happen in healthy clusters.
I can't see any more useful logging to add for this beyond what we normally log about nodes joining/leaving the cluster and the cluster health being green.
Correct, it failed in the finalisation stage, not the translog stage as you had said:
It's waiting for the recovering shard's local checkpoint to exceed the global checkpoint, indicating that it's processed all operations in the translog and can now be marked as in-sync.
Looking again at the hot threads output it seems that the generic threadpool is completely full of threads that are stuck here, presumably blocking something that would actually cause any of these checkpoints to update. I've not investigated what. As far as I can tell there's one of these actions per recovery, so this also looks like a consequence of the earlier excess of recoveries.
It does look like removing the replicas from all affected indices will unblock these actions, as will restarting the node, but I can't say what other long-term effects there might be lurking in this cluster. It's got into a very bad state. I recommend a complete restart.