Kibana_1 shard interferes with ES rebuild?

ttatzoll · June 20, 2025, 6:34pm

We have a 2-node ES cluster supporting a custom app. The nodes are as follows:

'hot' server w 64x CPU, 256 GB RAM, 40 TB data, and IP '1.2.3.4' (obfuscated)
'warm' server w 24x CPU, 128 GB RAM, 100 TB data and IP '5.6.7.8'
both running RedHat 9 Linux
both running ES 7.10 (yes, old version due to licensing; above my paygrade!)

We installed the Linux version of Kibana on the warm server and, in general, it works great. Unfortunately, though, whenever we perform maintenance, e.g. patch/reboot, and the cluster has to resync, it fails at 99% with one shard failing allocation -- where that shard is named "kibana_1" and is clearly the configuration data for Kibana.

I have been working-around this by deleting that shard. When I then re-open Kibana it prompts me for initial setup and works fine. That shard gets recreated (on the warm server again) and the cluster is 100% happy and 'green' status. Until I trigger a rebuild again. But is there a way to re-configure it so that it succeeds re-allocation?

Here's what the 'explain' command shows (all nodes/IP/hostnames are obfuscated):

$ curl localhost:9200/_cluster/allocation/explain?pretty

{
  "index" : ".kibana_1",
  "shard" : 0,
  "primary" : false,
  "current_state" : "unassigned",
  "unassigned_info" : {
    "reason" : "NODE_LEFT",
    "at" : "2025-06-19T22:02:35.168Z",
    "details" : "node_left [def456]",
    "last_allocation_status" : "no_attempt"
  },
  "can_allocate" : "no",
  "allocate_explanation" : "cannot allocate because allocation is not permitted to any of the nodes",
  "node_allocation_decisions" : [
    {
      "node_id" : "def456",
      "node_name" : "warm-server",
      "transport_address" : "5.6.7.8:9300",
      "node_attributes" : {
        "box_type" : "warm"
      },
      "node_decision" : "no",
      "deciders" : [
        {
          "decider" : "filter",
          "decision" : "NO",
          "explanation" : "node does not match index setting [index.routing.allocation.require] filters [box_type:\"hot\"]"
        }
      ]
    },
    {
      "node_id" : "abc123",
      "node_name" : "hot-server",
      "transport_address" : "1.2.3.4:9300",
      "node_attributes" : {
        "box_type" : "hot"
      },
      "node_decision" : "no",
      "deciders" : [
        {
          "decider" : "same_shard",
          "decision" : "NO",
          "explanation" : "a copy of this shard is already allocated to this node [[.kibana_1][0], node[abc123], [P], s[STARTED], a[id=ghi789]]"
        }
      ]
    },
    {
      "node_id" : "abc1234",
      "node_name" : "hot-server-data",
      "transport_address" : "1.2.3.4:9301",
      "node_attributes" : {
        "box_type" : "hot"
      },
      "node_decision" : "no",
      "deciders" : [
        {
          "decider" : "same_shard",
          "decision" : "NO",
          "explanation" : "a copy of this shard is already allocated to host address [5.6.7.8], on node [abc1234], and [cluster.routing.allocation.same_shard.host] is [true] which forbids more than one node on this host from holding a copy of this shard"
        }
      ]
    }
  ]
}

At the suggestion of other techs, I have tried updating this shard for:

"number_of_replicas":0
"auto_expand_replicas" : false

but neither of those changes helped.

Instead, I'm wondering if the issue is that I installed Kibana on the 'warm' server -- and now the 'explain' output appears to show that the shard expects to be on the 'hot' server? Is there a way to configure that shard to be happy on the warm server, instead? (Which it is, happy, when I re-initialize it anew.)

Any suggestions are appreciated! Thank you!

ttatzoll · June 20, 2025, 7:29pm

hmm, here's an update:

I retried changing those 2 settings (re "replicas") and now that one shard (".kibana_1") shows green! But instead there is now a second Kibana-related shard showing yellow -- "api"

Actually, it's labeled 'index' not shard, so maybe I don't really understand what I'm doing! (ha!) Regardless, the cluster health still shows 'yellow' and the 'allocation/explain' command now shows basically the same error, just with a different index:

$ curl localhost:9200/_cluster/allocation/explain?pretty
{
  "index" : "api",
  "shard" : 0,
  "primary" : false,
  "current_state" : "unassigned",
  "unassigned_info" : {
    "reason" : "NODE_LEFT",
    "at" : "2025-06-19T22:02:35.166Z",
    "details" : "node_left [def456]",
    "last_allocation_status" : "no_attempt"
  },
  "can_allocate" : "no",
  "allocate_explanation" : "cannot allocate because allocation is not permitted to any of the nodes",
  "node_allocation_decisions" : [
    {
      "node_id" : "def456",
      "node_name" : "warm-server",
      "transport_address" : "5.6.7.8:9300",
      "node_attributes" : {
        "box_type" : "warm"
      },
      "node_decision" : "no",
      "deciders" : [
        {
          "decider" : "filter",
          "decision" : "NO",
          "explanation" : "node does not match index setting [index.routing.allocation.require] filters [box_type:\"hot\"]"
        }
      ]
    },
    {
      "node_id" : "abc123",
      "node_name" : "hot-server",
      "transport_address" : "1.2.3.4:9300",
      "node_attributes" : {
        "box_type" : "hot"
      },
      "node_decision" : "no",
      "deciders" : [
        {
          "decider" : "same_shard",
          "decision" : "NO",
          "explanation" : "a copy of this shard is already allocated to this node [[api][0], node[abc123], [P], s[STARTED], a[id=ghi789]]"
        }
      ]
    },
    {
      "node_id" : "abc123",
      "node_name" : "hot-server-data",
      "transport_address" : "1.2.3.4:9301",
      "node_attributes" : {
        "box_type" : "hot"
      },
      "node_decision" : "no",
      "deciders" : [
        {
          "decider" : "same_shard",
          "decision" : "NO",
          "explanation" : "a copy of this shard is already allocated to host address [5.6.7.8], on node [def456], and [cluster.routing.allocation.same_shard.host] is [true] which forbids more than one node on this host from holding a copy of this shard"
        }
      ]
    }
  ]
}

ttatzoll · June 20, 2025, 7:59pm

bah, I think I fixed my own problem

I had to stop/edit/start both indices -- ".kibana_1" and then "api". My cluster now shows green! Apparently, Kibana creates both those indices/shards with default settings which are incompatible with my cluster setup, but changing both indices' settings fixed it. (I really though I'd already tried all this before but maybe I'm doing something different.)

Here's the complete sequence of commands I ran:

# curl -XPOST localhost:9200/.kibana_1/_close

# curl -XPUT 'localhost:9200/.kibana_1/_settings?pretty' -H 'Content-Type: application/json' -d' { "index": { "number_of_replicas":0, "auto_expand_replicas" : false } }'

# curl -XPOST localhost:9200/.kibana_1/_open

(at this point the 'health' command stops complaining about ".kibana_1" and instead points at "api")

# curl -XPOST localhost:9200/api/_close

# curl -XPUT 'localhost:9200/api/_settings?pretty' -H 'Content-Type: application/json' -d' { "index": { "number_of_replicas":0, "auto_expand_replicas" : false } }'

# curl -XPOST localhost:9200/api/_open

# curl localhost:9200/_cluster/health?pretty

RainTown · June 21, 2025, 12:58am

I am glad you have resolved your issue. But in passing ...

curl -XGET "localhost:9200/_cat/nodes?v"

might be interesting, In elasticsearch sense, probably you have a 3-node (2-server) cluster, with 2x elasticsearch nodes (instances) are running on the same server (the hot one)? This would be a less common setup. It also appears you have zero security enabled, which is also a bit unusual these days.

FYI a cluster has nodes (instances) and indices, and indices are made of 1 or more shards. Indices (and therefore shards) can be replicated, so your cluster could have primary shards and replica shards, spread around the nodes. How many shards and replicas per index, and which types of nodes can hold which indices (and which can't) is something you can control. Your combination of these settings appears to have been conflicting with your actual available resources, and maybe some system defaults. Since you fixed it likely not much point to dig around further for now.

For the record, wherever you had installed kibana, on hot server or on warm server or on a completely different server, it would have made no difference to the shard issue you faced.

Topic		Replies	Views
Restarting one of the nodes resulted in unassigned shards Elasticsearch	4	2663	July 6, 2017
Recovery restore Kibana	2	972	July 6, 2017
Elasticsearch is still initializing the kibana index (please help) Elasticsearch	12	15479	July 5, 2017
Red cluster Elasticsearch	7	668	February 21, 2019
Excess shards Elasticsearch	5	1642	July 6, 2017

Kibana_1 shard interferes with ES rebuild?

Related topics