Failed to create shard, failure IOException[failed to obtain in-memory shard lock]

We just rebuilt our cluster and are running into a problem we have not had before. We see almost constant relocating. During this process, we will eventually (every couple hours) run into a node or two stuck in Uninitialized. On _cluster/allocation/explain?pretty, we get:

{
  "index": "blah",
  "shard": 71,
  "primary": false,
  "current_state": "unassigned",
  "unassigned_info": {
    "reason": "ALLOCATION_FAILED",
    "at": "2020-06-12T19:23:11.894Z",
    "failed_allocation_attempts": 5,
    "details": "failed shard on node [-zoTGYAhSuOvdfXm9WnAdw]: failed to create shard, failure IOException[failed to obtain in-memory shard lock]; nested: ShardLockObtainFailedException[[foo_index][71]: obtaining shard lock timed out after 5000ms]; ",
    "last_allocation_status": "no_attempt"
  },

When we see this, running this always works:

POST /_cluster/reroute?retry_failed=true

If we leave it long enough, however, we run into an error where all replicas for a shard are offline and we go into Red.

What might cause this constant churn of things moving around?

We did have node_left delay set to 5 min, and saw this once:

"allocate_explanation": "cannot allocate because the cluster is still waiting 2.9m for the departed node holding a replica to rejoin, despite being allowed to allocate the shard to at least one other node",

So I changed that to 2m and bumped retries to 10, but still hitting same error.

Looking for ideas on where to start looking.

Thanks

Actually, I am not sure on the order of operations here. Looking at some telemetry, it looks like the Uninitialized state is what starts it.

In this chart:
Red --> Uninitialized
Orange --> Initializing
Yellow --> Relocating

it looks like the shards first go uninitialized, then initialized, then relocated. I guess that makes sense, but I am not sure why the initial Uninitialized happened.

The large spike at the right happened and resolved with no shards getting stuck in Uninitialized (i.e. all green after that).

Are nodes leaving the cluster and then immediately rejoining? Look for messages from the MasterService (on the elected master) about that. That'd explain shards suddenly becoming uninitialized, and also the failed to obtain in-memory shard lock message.

I currently can't get to the logs on the local node. In this case would it help to set index.unassigned.node_left.delayed_timeout to something really small? We usually have it with a longer delay (5m, but I changed to 2m) to deal with node reboots after patches, but I don't think that is happening here.

Not really, if your nodes aren't staying in the cluster then all sorts of other things will be behaving badly too. The fix is to keep the nodes in the cluster.

Troubleshooting this without logs is pretty much impossible so that's the first priority.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.