Unassigned Shards with error 'Failed to create index' - NullPointerException

LisaJ · July 1, 2022, 5:46pm

I encountered a problem bringing a cluster back from a full cluster restart. The cluster has 3 master, 3 coordinating, 2 hot, 2 warm, and 2 cold nodes.

All nodes appear to have rejoined the cluster successfully:

ip            heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name
10.2.3.197            8          99   1    0.12    0.22     0.22 dilt      -      eshost1551.example.com
10.2.3.202           67          95  27    1.24    0.95     0.98 ilm       *      eshost1570.example.com
10.2.3.200            3          84   1    0.09    0.08     0.12 dilt      -      eshost1550.example.com
10.2.3.201           13          97   2    0.18    0.15     0.14 ilm       -      eshost1569.example.com
10.2.3.205            9          97   2    0.27    0.15     0.12 l         -      eshost1573.example.com
10.2.3.196            1          63   1    0.13    0.15     0.27 dilt      -      eshost1554.example.com
10.2.3.206            8          97   1    0.04    0.04     0.06 l         -      eshost1574.example.com
10.2.3.199            3          96   2    0.20    0.14     0.14 dilt      -      eshost1549.example.com
10.2.3.204            8          75   2    0.02    0.08     0.12 l         -      eshost1572.example.com
10.2.3.203            9          94   2    0.11    0.11     0.13 ilm       -      eshost1571.example.com
10.2.3.195           63          94   3    0.20    0.18     0.24 dilt      -      eshost1553.example.com
10.2.3.198            2          95   2    0.20    0.23     0.22 dilt      -      eshost1552.example.com

But ~2000 of ~2400 shards are unassigned -- one hot node (eshost1553) has shards allocated to it. Using the "/_cluster/reroute?retry_failed=true" endpoint forces another allocation attempt, but everything just returns to UNASSIGNED status. The "/_cluster/allocation/explain" endpoint says it failed to create the index with a NullPointerException:

{
  "index": "security-auditlog-2022.01.30",
  "shard": 0,
  "primary": true,
  "current_state": "unassigned",
  "unassigned_info": {
    "reason": "MANUAL_ALLOCATION",
    "at": "2022-07-01T17:09:15.230Z",
    "details": "failed shard on node [ZZFUy43-So6NBdyhM_dPBw]: failed to create index, failure NullPointerException[null]",
    "last_allocation_status": "awaiting_info"
  },
  "can_allocate": "yes",
  "allocate_explanation": "can allocate the shard",
  "target_node": {
    "id": "jhzbGKS8TQePfepXOb_0kw",
    "name": "eshost1549.example.com",
    "transport_address": "10.2.3.199:9300",
    "attributes": {
      "ml.machine_memory": "67368648704",
      "ml.max_open_jobs": "20",
      "xpack.installed": "true",
      "box_type": "cold",
      "transform.node": "true"
    }
  },
  "allocation_id": "pi2twkWMSly5j9K6-GaGDQ",
  "node_allocation_decisions": [
    {
      "node_id": "ZZFUy43-So6NBdyhM_dPBw",
      "node_name": "eshost1550.example.com",
      "transport_address": "10.2.3.200:9300",
      "node_attributes": {
        "ml.machine_memory": "67368660992",
        "ml.max_open_jobs": "20",
        "xpack.installed": "true",
        "box_type": "cold",
        "transform.node": "true"
      },
      "node_decision": "yes",
      "store": {
        "in_sync": true,
        "allocation_id": "JVRoSHUcTRiuXg_YvcKilw"
      }
    },
    {
      "node_id": "jhzbGKS8TQePfepXOb_0kw",
      "node_name": "eshost1549.example.com",
      "transport_address": "10.2.3.199:9300",
      "node_attributes": {
        "ml.machine_memory": "67368648704",
        "ml.max_open_jobs": "20",
        "xpack.installed": "true",
        "box_type": "cold",
        "transform.node": "true"
      },
      "node_decision": "yes",
      "store": {
        "in_sync": true,
        "allocation_id": "pi2twkWMSly5j9K6-GaGDQ"
      }
    },
    {
      "node_id": "Jw9QJ3S5Tvyd36-iT8wprw",
      "node_name": "eshost1552.example.com",
      "transport_address": "10.2.3.198:9300",
      "node_attributes": {
        "ml.machine_memory": "67368648704",
        "ml.max_open_jobs": "20",
        "xpack.installed": "true",
        "box_type": "warm",
        "transform.node": "true"
      },
      "node_decision": "no",
      "store": {
        "found": false
      }
    },
    {
      "node_id": "t0jOfIYASESl8rQ66RPUyg",
      "node_name": "eshost1551.example.com",
      "transport_address": "10.2.3.197:9300",
      "node_attributes": {
        "ml.machine_memory": "67368660992",
        "ml.max_open_jobs": "20",
        "xpack.installed": "true",
        "box_type": "warm",
        "transform.node": "true"
      },
      "node_decision": "no",
      "store": {
        "found": false
      }
    },
    {
      "node_id": "z5llgKCRQ16M1rG0QhiHew",
      "node_name": "eshost1554.example.com",
      "transport_address": "10.2.3.196:9300",
      "node_attributes": {
        "ml.machine_memory": "67368419328",
        "ml.max_open_jobs": "20",
        "xpack.installed": "true",
        "box_type": "hot",
        "transform.node": "true"
      },
      "node_decision": "no",
      "store": {
        "found": false
      }
    },
    {
      "node_id": "zN0fjtZhTHiNkxtW5f3zqg",
      "node_name": "eshost1553.example.com",
      "transport_address": "10.2.3.195:9300",
      "node_attributes": {
        "ml.machine_memory": "67368419328",
        "ml.max_open_jobs": "20",
        "xpack.installed": "true",
        "box_type": "hot",
        "transform.node": "true"
      },
      "node_decision": "no",
      "store": {
        "found": false
      }
    }
  ]
}

Debug logging shows basically the same thing:

[2022-07-01T11:38:39,888][DEBUG][o.e.i.IndicesService     ] [eshost.example.com] creating Index [[security-auditlog-2020.11.26/Ne-p6ajnSyCv-5xM-UcOMA]], shards [1]/[1] - reason [CREATE_INDEX]
[2022-07-01T11:38:39,893][DEBUG][o.e.g.PersistedClusterStateService] [eshost.example.com] writing cluster state took [0ms]; wrote global metadata [false] and metadata for [16] indices and skipped [760] unchanged indices
[2022-07-01T11:38:39,894][DEBUG][c.a.o.s.OpenDistroSecurityPlugin] [eshost.example.com] Handle complianceConfig=null/dlsFlsAvailable: true/auditLog=class com.amazon.opendistroforelasticsearch.security.auditlog.NullAuditLog for onIndexModule() of index security-auditlog-2020.11.26
[2022-07-01T11:38:39,894][WARN ][o.e.i.c.IndicesClusterStateService] [eshost.example.com] [security-auditlog-2020.11.26][0] marking and sending shard failed due to [failed to create index]
java.lang.NullPointerException: null
        at com.amazon.opendistroforelasticsearch.security.OpenDistroSecurityPlugin.onIndexModule(OpenDistroSecurityPlugin.java:519) ~[?:?]
        at org.elasticsearch.plugins.PluginsService.onIndexModule(PluginsService.java:267) ~[elasticsearch-7.7.0.jar:7.7.0]
        at org.elasticsearch.indices.IndicesService.createIndexService(IndicesService.java:607) ~[elasticsearch-7.7.0.jar:7.7.0]
        at org.elasticsearch.indices.IndicesService.createIndex(IndicesService.java:549) ~[elasticsearch-7.7.0.jar:7.7.0]
        at org.elasticsearch.indices.IndicesService.createIndex(IndicesService.java:176) ~[elasticsearch-7.7.0.jar:7.7.0]
        at org.elasticsearch.indices.cluster.IndicesClusterStateService.createIndices(IndicesClusterStateService.java:484) ~[elasticsearch-7.7.0.jar:7.7.0]
        at org.elasticsearch.indices.cluster.IndicesClusterStateService.applyClusterState(IndicesClusterStateService.java:246) ~[elasticsearch-7.7.0.jar:7.7.0]
        at org.elasticsearch.cluster.service.ClusterApplierService.lambda$callClusterStateAppliers$5(ClusterApplierService.java:517) ~[elasticsearch-7.7.0.jar:7.7.0]
        at java.lang.Iterable.forEach(Iterable.java:75) [?:1.8.0_251]
        at org.elasticsearch.cluster.service.ClusterApplierService.callClusterStateAppliers(ClusterApplierService.java:514) [elasticsearch-7.7.0.jar:7.7.0]
        at org.elasticsearch.cluster.service.ClusterApplierService.applyChanges(ClusterApplierService.java:485) [elasticsearch-7.7.0.jar:7.7.0]
        at org.elasticsearch.cluster.service.ClusterApplierService.runTask(ClusterApplierService.java:432) [elasticsearch-7.7.0.jar:7.7.0]
        at org.elasticsearch.cluster.service.ClusterApplierService.access$100(ClusterApplierService.java:73) [elasticsearch-7.7.0.jar:7.7.0]
        at org.elasticsearch.cluster.service.ClusterApplierService$UpdateTask.run(ClusterApplierService.java:176) [elasticsearch-7.7.0.jar:7.7.0]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:633) [elasticsearch-7.7.0.jar:7.7.0]
        at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:252) [elasticsearch-7.7.0.jar:7.7.0]
        at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:215) [elasticsearch-7.7.0.jar:7.7.0]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_251]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_251]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_251]
[2022-07-01T11:38:39,895][DEBUG][o.e.c.a.s.ShardStateAction] [eshost.example.com] sending [internal:cluster/shard/failure] to [YpOcM6LtTIKVV3swKpZAWA] for shard entry [shard id [[security-auditlog-2020.11.26][0]], allocation id [n1zocy8ZTFqgFckuyVd9lg], primary term [0], message [failed to create index], failure [NullPointerException[null]], markAsStale [true]]

Trace logging shows:

-- index [[security-auditlog-2020.11.26/Ne-p6ajnSyCv-5xM-UcOMA]]
----shard_id [security-auditlog-2020.11.26][0]
--------[security-auditlog-2020.11.26][0], node[null], [P], recovery_source[existing store recovery; bootstrap_history_uuid=false], s[UNASSIGNED], unassigned_info[[reason=MANUAL_ALLOCATION], at[2022-07-01T17:05:16.424Z], delayed=false, details[failed shard on node [z5llgKCRQ16M1rG0QhiHew]: failed to create index, failure NullPointerException[null]], allocation_status[deciders_throttled]]
--------[security-auditlog-2020.11.26][0], node[null], [R], recovery_source[peer recovery], s[UNASSIGNED], unassigned_info[[reason=CLUSTER_RECOVERED], at[2022-07-01T04:17:17.657Z], delayed=false, allocation_status[no_attempt]]

What can we do to understand what is causing the shards to remain unassigned?

Christian_Dahlqvist · July 1, 2022, 6:21pm

The null pointer exception seems related to OpenDistro, which is not supported here. I would recommend you open an issue at the OpenDistro/OpenSearch forum.

system · July 1, 2022, 6:21pm

OpenSearch/OpenDistro are AWS run products and differ from the original Elasticsearch and Kibana products that Elastic builds and maintains. You may need to contact them directly for further assistance.

(This is an automated response from your friendly Elastic bot. Please report this post if you have any suggestions or concerns )

LisaJ · July 1, 2022, 6:32pm

Thanks for the pointer -- we'll dig into the OpenDistro plug-in and see what we can find.

system · July 29, 2022, 6:33pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Unassigned Shards Problem Elasticsearch	5	624	July 6, 2017
Shards not assigned when creating an index Elasticsearch	3	1062	July 5, 2017
ES cluster unable to assing new shards Elasticsearch	5	587	July 6, 2017
Permanent unassigned shards in latest logstash index Elasticsearch	9	697	July 6, 2017
Shards UNASSIGNED even tho they exist on disk Elasticsearch	2	555	July 6, 2017

Unassigned Shards with error 'Failed to create index' - NullPointerException

Related topics