Recover cluster after adding a data node

I have added a data node to my ES cluster which already had 4 nodes. After addition, the number of shards jumped from 4(2 primary, 2 replica) to 8(4 primary and 4 replica). The newly made 4 shards are unassigned and my cluster health is red.

{
  "cluster_name": "dr-logcluster",
  "status": "red",
  "timed_out": false,
  "number_of_nodes": 10,
  "number_of_data_nodes": 5,
  "active_primary_shards": 573,
  "active_shards": 699,
  "relocating_shards": 0,
  "initializing_shards": 2,
  "unassigned_shards": 1762,
  "delayed_unassigned_shards": 448,
  "number_of_pending_tasks": 4,
  "number_of_in_flight_fetch": 0,
  "task_max_waiting_in_queue_millis": 315,
  "active_shards_percent_as_number": 28.380024360535934
}

How should I recover my cluster?

What do you see when running this:

GET /_cluster/allocation/explain?pretty

? This will tell you why shards cannot be allocated.

1 Like

If I'm not mistaken, you have around 500 shards per node, am I correct?
How much HEAP do you have per data node?

Please format your code, logs or configuration files using </> icon as explained in this guide and not the citation button. It will make your post more readable.

Or use markdown style like:

```
CODE
```

This is the icon to use if you are not using markdown format:

There's a live preview panel for exactly this reasons.

Lots of people read these forums, and many of them will simply skip over a post that is difficult to read, because it's just too large an investment of their time to try and follow a wall of badly formatted text.
If your goal is to get an answer to your questions, it's in your interest to make it as easy to read and understand as possible.
Please update your post.

The problem is:

cannot allocate because a previous copy of the primary shard existed but can no longer be found on the nodes in the cluster

Which means that you lost a primary shard.

{
  "index": "error-administrator-2019.06.21",
  "shard": 1,
  "primary": true,
  "current_state": "unassigned",
  "unassigned_info": {
    "reason": "NODE_LEFT",
    "at": "2019-07-16T05:03:23.549Z",
    "details": "node_left[0MdxLk76SNCwFezbM8Uybw]",
    "last_allocation_status": "no_valid_shard_copy"
  },
  "can_allocate": "no_valid_shard_copy",
  "allocate_explanation": "cannot allocate because all found copies of the shard are either stale or corrupt",
  "node_allocation_decisions": [
    {
      "node_id": "0MdxLk76SNCwFezbM8Uybw",
      "node_name": "dxb-dso01-nec-nfvi2-cmn-nec-celshd02.nfvi.localdomain",
      "transport_address": "172.17.41.146:9300",
      "node_attributes": {
        "xpack.installed": "true"
      },
      "node_decision": "no",
      "store": {
        "in_sync": false,
        "allocation_id": "OIUY3wBkREK9plyiLOIsnw"
      }
    },
    {
      "node_id": "7Vhv_D4oR_2KtDQpWuxtPw",
      "node_name": "dxb-dso01-nec-nfvi2-cmn-nec-celshd05.nfvi.localdomain",
      "transport_address": "172.17.41.149:9300",
      "node_attributes": {
        "xpack.installed": "true"
      },
      "node_decision": "no",
      "store": {
        "found": false
      }
    },
    {
      "node_id": "B2GcZ5KFRG23IMePeJWA4g",
      "node_name": "dxb-dso01-nec-nfvi2-cmn-nec-celshd01.nfvi.localdomain",
      "transport_address": "172.17.41.145:9300",
      "node_attributes": {
        "xpack.installed": "true"
      },
      "node_decision": "no",
      "store": {
        "found": false
      }
    },
    {
      "node_id": "cC9S519DT0WvEg-dDrivzA",
      "node_name": "dxb-dso01-nec-nfvi2-cmn-nec-celshd04.nfvi.localdomain",
      "transport_address": "172.17.41.148:9300",
      "node_attributes": {
        "xpack.installed": "true"
      },
      "node_decision": "no",
      "store": {
        "found": false
      }
    },
    {
      "node_id": "eaIb-E2ET5-hqTUs9VQRqg",
      "node_name": "dxb-dso01-nec-nfvi2-cmn-nec-celshd03.nfvi.localdomain",
      "transport_address": "172.17.41.147:9300",
      "node_attributes": {
        "xpack.installed": "true"
      },
      "node_decision": "no",
      "store": {
        "in_sync": false,
        "allocation_id": "g0WbM40GQumtOPpfwknVbg"
      }
    }
  ]
}

Below given is the heap allocated to ES

-Xms31232m
-Xmx31232m

As you opened a new discussion at Shards unassigned after datanode addition, let's keep the new discussion there.

Next time, unless it's a different not related problem, please keep the discussion in one single place. Thanks.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.