ES 7.5 translog recovery is extremely slow

Thanks, that's helpful. Here are some edited highlights:

    "commit": {
      "user_data": {
        "local_checkpoint": "212661",
        "min_retained_seq_no": "178451",
        "max_seq_no": "212732"
      },
      "num_docs": 47198999
    },
    "seq_no": {
      "max_seq_no": 215715,
      "local_checkpoint": 215711,
      "global_checkpoint": 215579
    },
    "retention_leases": {
      "primary_term": 1,
      "version": 287,
      "leases": [
        {
          "id": "peer_recovery/up6P4q_ARIO1W7Wewt5s7g",
          "retaining_seq_no": 178451,
          "timestamp": 1579646940678,
          "source": "peer recovery"
        },
        ...
      ]
    },

In particular "max_seq_no": 215715 means that this shard only contains 215715 operations. The target of the recovery in question is up6P4q_ARIO1W7Wewt5s7g and the lease for that node has "retaining_seq_no": 178451 indicating that the target is missing 37264 of them. But there is some disagreement: "num_docs": 47198999. I think this means you have a lot of nested documents?

Looking at the recovery stats:

        "translog" : {
          "recovered" : 16507,
          "total" : 2870006,
          "percent" : "0.6%",
          "total_on_start" : -1,
          "total_time_in_millis" : 519729
        },

The total is supposed to be the total number of operations to recover, which should be in the region of 37264, but is instead much larger than the total number of operations in this shard. I suspect we might be calculating the total wrong in the presence of nested documents, maybe counting documents instead of operations, and in fact this recovery need only complete ~37000 operations so it's around 44% complete. If you let it run for another 10-15 minutes does it finish? @nhat does that sound like a plausible explanation to you?

Since this replica is missing ~17% of the operations in this shard Elasticsearch would normally consider it more efficient simply to copy the files over again, but unfortunately for tricky technical reasons if you're using nested documents then that calculation is skewed in favour of replaying the translog.

1 Like