Unable to ship metrics when using metricbeat provided index template, 400 error "routing_missing_exception"

Hello,

We are using metricbeat (8.14) on k8s to ship metrics to an elasticsearch (8.13) cluster. This has been working well however last week we discovered a mapping issue where sometimes some of our metrics are getting set as a float and somtimes they are getting set as a long. We have been using dynamic mapping and this is likely why this is happening. We would like to move to static mapping using the metricbeat provided index template. Our current index template looks like so:

{
  "template": {
    "settings": {
      "index": {
        "lifecycle": {
          "name": "bmap_obsv_ilm_policy"
        },
        "routing": {
          "allocation": {
            "include": {
              "_tier_preference": "data_hot"
            }
          }
        },
        "mapping": {
          "total_fields": {
            "limit": "2000"
          }
        },
        "refresh_interval": "30s",
        "number_of_shards": "1",
        "number_of_replicas": "1"
      }
    },
    "mappings": {
      "dynamic": "true"
    },
    "aliases": {}
  }
}

On our test environment I was able to switch to the metricbeat provided index template and everything worked as expected however when I switched our dev environment over to it I was getting the following error from metricbeat:

{"log.level":"debug","@timestamp":"2024-08-27T16:49:15.747Z","log.logger":"elasticsearch","log.origin":{"function":"github.com/elastic/beats/v7/libbeat/outputs/elasticsearch.(*Client).bulkCollectPublishFails","file.name":"elasticsearch/client.go","file.line":430},"message":"Cannot index event publisher.Event{Content:beat.Event{Timestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Meta:null, Fields:null, Private:interface {}(nil), TimeSeries:false}, Flags:0x0, Cache:publisher.EventCache{m:mapstr.M(nil)}, EncodedEvent:(*elasticsearch.encodedEvent)(0xc004c26d80)} (status=400): {\"type\":\"routing_missing_exception\",\"reason\":\"routing is required for [.ds-dev2-cust-eastus-metrics-8.14.0-2024.08.27-2024.08.27-000001]/[noC9lJEBUqYkhwlM2TDh]\",\"index_uuid\":\"_na_\",\"index\":\".ds-dev2-cust-eastus-metrics-8.14.0-2024.08.27-2024.08.27-000001\"}, dropping event!","service.name":"metricbeat","ecs.version":"1.6.0"}

I'm looking for help trying to figure out why this might be.

Here is the index template which we are trying to move to but having issues:

(too large to include here)

switching back to the current basic dynamic index template fixes the issue and allows metricbeat to start shipping metrics again and this index template is all that is changing so it seems something specified in this index template is causing issues but I'm unsure what.

If anyone help explain why this would work fine in one environment of ours but cause the 400 errors and how to fix those that would be much appreciated.

On the env that is not working...

Hi @dfinn

You need to show us all the index settings for the index that is throwing the errors

GET .ds-dev2-cust-eastus-metrics-8.14.0-2024.08.27-2024.08.27-000001/_settings

Then

GET _nodes

and show the roles and routing settings from same env...

ok @stephenb , first off I made a mistake at some point and thought this was working in our test environment but we have issues there as well. at least it's consistent across our environments.

here is the output you asked for
index settings for today:

node output:

as far as I can tell, all the routing settings look fine. it's looking for data_hot and I have nodes that have the data role which should be satisfying that.

also including my metricbeat output section in case that is helpful:

output.elasticsearch:
  hosts: "[${ELASTICSEARCH_HOST}:9200]"
  protocol: "${ELASTICSEARCH_PROTOCOL:https}"
  username: "${ELASTICSEARCH_USERNAME:elastic}"
  ssl.verification_mode: "${ELASTICSEARCH_VERIFYSSL:none}"
  password: "${ELASTICSEARCH_PASSWORD:welcome1}"
  index: "${INDEX_NAME}-%{[agent.version]}-%{+yyyy.MM.dd}"
  allow_older_versions: true
env|grep INDEX_NAME
INDEX_NAME=platdev2-cust-eastus-metrics

also, I upgraded our test environment to 8.14.3 to see if that changed anything and it has not, still having the same issue when applying the metricbeat index template

hi @dfinn

Pretty You are missing the correct roles on your nodes...

Your index

        "routing": {
          "allocation": {
            "include": {
              "_tier_preference": "data_hot" <<< Where it will go... 
            }
          }
        },

You nodes..

      "roles": [ <<< NOT HERE , nowhere for index to go..
        "data",
        "ingest"

      ],

should be...

          "roles": [
            "ingest",
            "transform",
            "data_hot", <<< MISSING : Where your index wants to go
            "data_content", <<< MISSING 
            "master",
            "remote_cluster_client"
          ]
        },

see here

that is interesting because this works perfectly fine before applying the metricbeat index template. what is it about that template that would require this change?

I'm looking for where I saw this but yesterday I found somewhere in the elasticsearch documentation that having the data role on a node satisfies the data_hot requirement. Trying to find where I saw that...

here it is:

" Data node
A node that has one of several data roles. Data nodes hold data and perform data related operations such as CRUD, search, and aggregations. A node with a generic data role can fill any of the specialized data node roles."

and lower down on that same page:

" If you want to include a node in all tiers, or if your cluster does not use multiple tiers, then you can use the generic data role."

so I'm not sure this is actually required and it does seem to work fine prior to applying this metricbeat index template.

The hot tier is required. New indices that are part of a data stream are automatically allocated to the hot tier.

To create a dedicated hot node, set:

node.roles: [ data_hot ]

It is a data stream... it needs the correct roles...

Perhaps I am wrong... but they need to match....

From what I'm reading we should be satisfying the requirement and it does work as expected before applying the MB template, metrics come in and are written to the data stream so I'm nearly sure this is not required. I'm happy to test though, is there a way to modify the roles of an existing node, so far I'm not finding a way to do that.

Unclear what this means ... what do you mean before applying the template? What was it used before ... are you upgrading from 7.X or something...

Yeah, I think the docs can be a bit confusing... data streams require the correct roles...

Metricbeat 8.x uses data streams
Data streams require the correct roles

You define a node’s roles by setting node.roles in elasticsearch.yml .

Sorry, I mentioned this in the original post but maybe it wasn't clear. Prior to trying to use the specific metricbeat index template we have been using a very basic index template which uses dynamic mapping and this has been working perfectly fine for us other than occasionally running into issues where field types get set incorrectly, hence the need to use the metricbeat index template if possible. Here is our current/old index template for metricbeat datastreams:

{
  "template": {
    "settings": {
      "index": {
        "lifecycle": {
          "name": "bmap_obsv_ilm_policy"
        },
        "routing": {
          "allocation": {
            "include": {
              "_tier_preference": "data_hot"
            }
          }
        },
        "mapping": {
          "total_fields": {
            "limit": "2000"
          }
        },
        "refresh_interval": "30s",
        "number_of_shards": "1",
        "number_of_replicas": "1"
      }
    },
    "mappings": {
      "dynamic": "true"
    },
    "aliases": {}
  }
}

pretty simple but as you can see it has the same routing settings specified. When using this index template everything works as expected, metricbeat can ship metrics and they get written to the correct datastream.

the issue arises when we switch to the metricbeat specific index template and it's not clear why that is causing issues. This happens on both 8.13.0 and 8.14.3.

Probably because your previous was not a data stream, just a plain ole index... so the node role was not enforced...

Are you sure that was a data stream? That previous template does not appear to be a data stream...

I am not sure at this point... but there is an easy test... fix your node roles....

yes, it's previously a datastream and has been that way for quite some time. I'm testing now, attempting to add the 2 roles you mentioned to our data nodes. If it helps I could provide settings from our dev cluster which at this point I've rolled back to using the dynamic mapping template for metricbeat datastreams.

edit, here is the settings from todays metricbeat datastream from our dev cluster using the dynamic mapping template, you can see it's a datastream and has the same routing settings as our test env:

{
  ".ds-dev2-cust-eastus-metrics-8.14.0-2024.08.28-2024.08.28-000001": {
    "settings": {
      "index": {
        "lifecycle": {
          "name": "bmap_obsv_ilm_policy"
        },
        "routing": {
          "allocation": {
            "include": {
              "_tier_preference": "data_hot"
            }
          }
        },
        "mapping": {
          "total_fields": {
            "limit": "2000"
          }
        },
        "refresh_interval": "30s",
        "hidden": "true",
        "number_of_shards": "1",
        "provided_name": ".ds-dev2-cust-eastus-metrics-8.14.0-2024.08.28-2024.08.28-000001",
        "creation_date": "1724803201210",
        "number_of_replicas": "1",
        "uuid": "EZWofRcYROeXP-o4StyOVw",
        "version": {
          "created": "8503000"
        }
      }
    }
  }
}

and the roles in this environment are the same (which makes sense, this is all being deployed via playbooks so it should be identical):

      "roles": [
        "data",
        "ingest"
      ],

so, if there is a requirement to add these additional roles that requirement does not arise until applying the metricbeat specific index template

If you look deeply into the metricbeat template you will see this...

        "data_stream": {
          "hidden": false,
          "allow_custom_routing": false <<< Might be this setting which is requiring the data hot... 
        }
      }

It may be

AHHH I am missing something.... your is a data stream

".ds-dev2-cust-eastus....

yes, nearly positive it's a data stream, it shows up in the data stream section in the kibana UI

it is probably the

"allow_custom_routing": false <<< Might be this setting which is requiring the data hot...

You could try changing that... in the template... then restart without reloading the template.

ok, interesting. the allow_custom_routing option comes from the default MB template and I just left it, I can try removing.

But...data is now flowing into that datastream after adding those 2 roles. So maybe something about the allow_custom_routing setting is what is requiring those roles to be specifically applied to the nodes?

what is the proper fix here? add these roles or set allow_custom_routing to false?

1 Like

This is good thanks for the Discussion ...
I did not see customer routing before...
Today I Learned (TIL)

actually, I may have spoke too soon. I thought I saw that there was now documents in this index but there are not any after further inspection. I'm testing now with setting allow_custom_routing to false but adding the roles to the nodes did not fix the issue.