"operation primary term [1] is too old (current [2])"

The operation primary term [1] is too old (current [2]) message is a consequence of the node holding the primary shard leaving the cluster and a replica shard being promoted to primary in its place.

A ≥30s GC pause would cause the master to consider a node unhealthy and make it leave the cluster.

The main question is why did it have a ≥30s GC pause. Perhaps this is snapshot-related, but your steady-state heap usage looks very high anyway so maybe the snapshot just pushed it over the edge. In any case you are running a very old version, long past EOL, and there have been significant improvements to memory usage in the ~2 years since 7.6 was released, so an upgrade is overdue and strongly recommended.

3 Likes