Internal monitoring indices shards in UNASSIGNED state

Hi,
We have an 8.x ES cluster running on Elastic cloud, All seems to be working ok except that a few internal monitoring indices have one shard marked as UNASSIGNED and this causes the cluster state to be YELLOW, all the time...

I followed this Guide to troubleshoot:

Get info about shards - find which ones are unassigned:

GET _cat/shards?v=true&h=index,shard,prirep,state,node,unassigned.reason&s=state

Response as of Dec 14, 9:33PM EST:

index                                                         shard prirep state      node                unassigned.reason
.ds-.monitoring-es-8-mb-2022.12.14-000003                     0     r      UNASSIGNED                     INDEX_CREATED
.ds-.monitoring-kibana-8-mb-2022.12.14-000003                 0     r      UNASSIGNED                     INDEX_CREATED

Get more details about the allocation for the specific index/shard:

GET _cluster/allocation/explain
{
  "index": ".ds-.monitoring-es-8-mb-2022.12.14-000003", 
  "shard": 0, 
  "primary": false 
}

Result:

{
  "index": ".ds-.monitoring-es-8-mb-2022.12.14-000003",
  "shard": 0,
  "primary": false,
  "current_state": "unassigned",
  "unassigned_info": {
    "reason": "INDEX_CREATED",
    "at": "2022-12-14T21:45:50.221Z",
    "last_allocation_status": "no_attempt"
  },
  "can_allocate": "no",
  "allocate_explanation": "Elasticsearch isn't allowed to allocate this shard to any of the nodes in the cluster. Choose a node to which you expect this shard to be allocated, find this node in the node-by-node explanation, and address the reasons which prevent Elasticsearch from allocating this shard there.",
  "node_allocation_decisions": [
    {
      "node_id": "hYTLxs3cQwSSQNRWNCVHJg",
      "node_name": "instance-0000000001",
      "transport_address": "10.xxx:19240",
      "node_attributes": {
        "logical_availability_zone": "zone-0",
        "server_name": "instance-0000000001.aa7f0380c25d414f810f9a23d173130b",
        "availability_zone": "us-east4-a",
        "xpack.installed": "true",
        "data": "warm",
        "instance_configuration": "gcp.es.datawarm.n2.68x10x190",
        "region": "unknown-region"
      },
      "node_decision": "no",
      "weight_ranking": 1,
      "deciders": [
        {
          "decider": "data_tier",
          "decision": "NO",
          "explanation": "index has a preference for tiers [data_hot] and node does not meet the required [data_hot] tier"
        }
      ]
    },
    {
      "node_id": "RR_pUkQdRR2HqwjMlNhtVg",
      "node_name": "instance-0000000000",
      "transport_address": "10.xxx:19870",
      "node_attributes": {
        "region": "unknown-region",
        "instance_configuration": "gcp.es.datahot.n2.68x10x45",
        "server_name": "instance-0000000000.aa7f0380c25d414f810f9a23d173130b",
        "data": "hot",
        "xpack.installed": "true",
        "logical_availability_zone": "zone-0",
        "availability_zone": "us-east4-a"
      },
      "node_decision": "no",
      "weight_ranking": 2,
      "deciders": [
        {
          "decider": "same_shard",
          "decision": "NO",
          "explanation": "a copy of this shard is already allocated to this node [[.ds-.monitoring-es-8-mb-2022.12.14-000003][0], node[RR_pUkQdRR2HqwjMlNhtVg], [P], s[STARTED], a[id=fxfbSsCnTq-DGvTI4laIPg]]"
        }
      ]
    }
  ]
}

This part seems to offer a promising reason:

"explanation": "index has a preference for tiers [data_hot] and node does not meet the required [data_hot] tier"

However, I'm not sure how to solve this - since I am not controlling templates or ILMs for these internal indices and do not set some custom policies for shard allocations ... Whatever is the default policy - it should be used as is.

My cluster config:
-- 2 nodes
-- 1 availability zone
-- Hot and Warm storage only, no cold

Any advise on how to troubleshoot this further would be very appreciated!

Thank you!
Marina

Hi @ppine7

Are you saying you only have one hot and one warm node in one zone.

By default all the system, indices and monitoring indices (basically as well as any other defaults) expect to have one primary plus one replica shard.

If you only have one hot node you have no place to put the replica shard because of primary and replica will not be on the same node.

So your choice is to set replicas to zero or live with the yellow cluster or add another hot node, preferably in a different zone.

Same will go for warm when you get there... Basically when you only have one zone by a default your cluster's always going to be yellow unless you set replica's to zero on every index. It's a way of telling you that you don't have any high availability. If you lose one node, you're down.

Hopefully that makes sense.

Thank you, @stephenb !
Yes, unfortunately due to costs we decided to go with the cheapest configuration of the ES cluster, just 1 zone, since the data is not critical ....

I suspected that was the case... not enough nodes for all replicas. But I am still puzzled by a few points:
-- if the index template fo rthose internal indices requires two shards - why not ALL of them are UNASSIGNED? but only some?
What I mean is:
There are a few indices that were created for monitoring already, I'm assuming form the smae template, right?

But when I check indices - I see that only the last index from each "Type/template" is in th eYELLOW state with unassigned shards....
Here is an example:

so, for example , this index is GREEN:
.ds-.monitoring-kibana-8-mb-2022.12.11-000002

but the next one of the same type is YELLOW:
.ds-.monitoring-kibana-8-mb-2022.12.14-000003

Why is the first one has 0 unassigned shards but the second one - 1? Should not they both be created i exactly the same way?

So I also checked settings for indices to see which ones have replication set to 1:
GET .monitoring*/_settings/index.number_of_replicas

and see that all above patterns have RF = 1 (not 0):

  ".ds-.monitoring-es-8-mb-2022.12.11-000002": {
    "settings": {
      "index": {
        "number_of_replicas": "1"
      }
    }
  },
  ".monitoring-beats-7-2022.12.15": {
    "settings": {
      "index": {
        "number_of_replicas": "0"
      }
    }
  },
  ".ds-.monitoring-es-8-mb-2022.12.14-000003": {
    "settings": {
      "index": {
        "number_of_replicas": "1"
      }
    }
  },
  ".ds-.monitoring-kibana-8-mb-2022.12.14-000003": {
    "settings": {
      "index": {
        "number_of_replicas": "1"
      }
    }
  },
  ".ds-.monitoring-kibana-8-mb-2022.12.11-000002": {
    "settings": {
      "index": {
        "number_of_replicas": "1"
      }
    }
  }
}

still one index is GREEN the other is YELLOW - even though they are created form the same template (I assume).

Further I noticed one more weird (to me) thing:
the index and shard distribution is not equal across two nodes that I have:

most of the shards are on the node 0000, and only 2 shards are on the node 0001.
Funnily enough, those 2 shards on the node 0001 are the GREEN shards from the indices I showed above:

while the later shards of the same indices are YELLOW on the node 0000 ....

something seems wrong with this distribution :slight_smile:

Thank you!!
Marina

Hi @ppine7

Back to Basics...

  1. If you only have 1 node per tier unless replicas are 0 you will always be yellow in a properly configured cluster because Primary and Replicas can not be on the same node.
  2. An Index is only on 1 Data Tier at a time.
  3. Primaries and replicas for the same Index live on the same data tier, i.e. they do not span data tiers, and are not allowed on the same node.

Solution : Set All Replicas to 0 or Have 2 Nodes per tier tier.

I really can not debug all this ..But general is what I believe is happening is there are some issues with perhaps some of the with respect to data_content, data_hot, data_warm and so some of those templates is allowing some of the Primary / Replica to span tiers which is not normal, maybe its a bug or some other side effect...

Not sure why you think that is wrong... it is showing all the primaries on 1 node... since you only have 1 hot node that is exactly what I would expect.....

If you set ALL indices to 0 replicas your cluster would go green... but as soon as a template creates a new index with a replica it will go yellow...

tl;dr if you are only in 1 zone get used to yellow cluster :slight_smile: or fix all the templates and indices!

This is exactly the reason! per 3) Above :slight_smile:

Thank you, @stephenb !

Since I cannot fix the cluster (not an option to increase the price tag), and do not want to live with a YELLOW cluster :slight_smile: - I decided to fix the templates....
Since all monitoring indices seem to be backing indices of streams, I was able to find out which templates they were associated with by inspecting the data streams:

GET _data_stream

{
  "data_streams": [
    {
      "name": ".monitoring-es-8-mb",
      "timestamp_field": {
        "name": "@timestamp"
      },
      "indices": [
        {
          "index_name": ".ds-.monitoring-es-8-mb-2022.12.11-000002",
          "index_uuid": "lBCmXl_sRIy2pKuY6PbmrQ"
        },
        {
          "index_name": ".ds-.monitoring-es-8-mb-2022.12.14-000003",
          "index_uuid": "ONcGZeZDSxqyVLgHC2tW6A"
        }
      ],
      "generation": 4,
      "status": "YELLOW",
      "template": ".monitoring-es-mb",
      "ilm_policy": ".monitoring-8-ilm-policy",
      "hidden": false,
      "system": false,
      "allow_custom_routing": false,
      "replicated": false
    },
...

Here is one such template:
"template": ".monitoring-es-mb"

Then I was able to update this index template manually via Stack Management -> Index Management -> Templates console UI.
Once the data stream rolled over - I confirmed that the new (and old, which is still puzzling) index was created GREEN !

Now I would like to do the same updates to all other similar index templates, in all environments, via an API call - to be able to incorporate this into our automation later on.

And I have a question about how to do this safely: I only want to update one setting:

"number_of_replicas": "0" 

Looking at the example update template commands - I found this doc: Change mappings and settings for a data stream | Elasticsearch Guide [8.5] | Elastic and from what I see a command like the following should be ok:

PUT /_index_template/.monitoring-kibana-mb
{
  "index_patterns": [ ".monitoring-kibana-8-*" ],
  "data_stream": { },
  "template": {
    "settings": {
      "number_of_replicas": "0"             
    }
  }
}

But my worry is:
is it going to update ONLY the number_of_replicas setting - or is it going to update/wipe out other parts of this very large and complex template - like index patters, etc... ?
Do I need to specify the index_patterns and data_stream parts in the update command ?

I'd rather not experiment and wipe out working templates :slight_smile: - although I did make a backup just in case ....

Thank you!!
Marina

@ppine7 As Far as I can tell you have to PUT the entire template not just the setting you want to update... I could not get just a partial update to work..

yeah.... looks like it is not supported, according to this request: PUT Index template - update value · Issue #57499 · elastic/elasticsearch · GitHub

I wish ES team would consider doing this though, as forcing to re-upload the full template is very error-prone, especially for huge templates like new internal monitoring ones ...

In fact, with the new 8.x style of index templates it is less clear how to update/create them as they seem to be composable of mutliple sub-parts now ....
For example - the ".monitoring_es_mb" template:

  • has 3698 lines (!)
  • has an array of index_template objects:
  • has interesting properties at the end such as 'composed_of' and 'version' ...

so not sure what the full PUT command should look like - as a PUT for one template? or an array?
Would this version be the correct one:

PUT /_index_template/.monitoring-kibana-mb
{
  "index_patterns": [ ".monitoring-kibana-8-*" ],
  "data_stream": {... },
  "template": {
    "settings": {
      "number_of_replicas": "0"             
    }
  }
}

or this one:

PUT /_index_template/.monitoring-kibana-mb
{
  "index_templates": [
    {
      "name": ".monitoring-es-mb",
      "index_template": {
        "index_patterns": [
          ".monitoring-es-8-*"
        ],
        "template": {
          "settings": {
            "index": {
              "lifecycle": {
                "name": ".monitoring-8-ilm-policy"
              },
              "number_of_replicas": "0",
              "mapping": {
                "total_fields": {
                  "limit": "2000"
                }
              }
            }
          },
          "mappings": {...}
        },
        "composed_of": [],
        "version": 8000102,
        "data_stream": {
          "hidden": false,
          "allow_custom_routing": false
        }
      }
    }
  ]
}

??

Sorry, maybe this is now taking this post to a different direction.... if so - I can post a different question just on how to update new composable data stream templates :slight_smile:

Thanks you!!

Hmmm maybe I am missing something I do this all the time... I don't have time to go into / completely understand the RESTness of the APIs, tl;dr my understand is that we name our resources ... There are request about being able to "re-PUT" a GET ... not my area of expertise...

GET /_index_template/.monitoring-beats-mb

Result
You need to remove the array, name and the type
and the the last } ] }

{                                    <!--- Get rid of these 
  "index_templates": [               <!--- Get rid of these 
    {                                <!--- Get rid of these 
      "name": ".monitoring-beats-mb",<!--- Get rid of these 
      "index_template": {   <!---- the PUT request starts with this {
        "index_patterns": [
          ".monitoring-beats-8-*"
        ],
        "template": {
          "settings": {
            "index": {
              "lifecycle": {
                "name": ".monitoring-8-ilm-policy"
              },
              "mapping": {
                ....
        },
        "composed_of": [],
        "version": 8000103,
        "data_stream": {
          "hidden": false,
          "allow_custom_routing": false
        }
      }
    }  <!--- Get rid of these 
  ] <!--- Get rid of these 
} <!--- Get rid of these 

So the put look like

PUT  /_index_template/.monitoring-beats-mb
{
    "settings": {
      "index": {
        "lifecycle": {
          "name": ".monitoring-8-ilm-policy"
        },
        "mapping": {
          "total_fields": {
            "limit": "2000"
          }
        },
        "number_of_replicas": "0"
      }
    },
    "mappings": {
....
  },
  "composed_of": [],
  "version": 8000103,
  "data_stream": {
    "hidden": false,
    "allow_custom_routing": false
  }
}

Thank you, @stephenb ! Yes, this worked - I was able to update the internal indices with 0 replicas - without breaking them :slight_smile:

And once I force-rolled over those indices - the new ones were GREEN.

thank you!

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.