Metricbeat-Kubernetes startup error when upgrading from version 7.2 to 7.16.1

At a recently upgraded cluster from version 7.2 to 7.16.1, we tried to update Metricbeat for Kubernetes PODs. However on startup we got the following error:

ERROR metrics/metrics.go:304 error determining cgroups version: error reading /proc/11483/cgroup: open /proc/11483/cgroup: no such file or directory

After some research we assume that this may be related to the currently Open Issue: Monitoring: allow specifying /proc or hostfs path. · Issue #23267 · elastic/beats · GitHub

Then, we tried to go at least to version 7.13. But again we got an error, which I paste below:

2022-02-14T15:14:33.811Z        WARN    [elasticsearch] elasticsearch/client.go:408     Cannot index event publisher.Event{Content:beat.Event{Timestamp:time.Time{wall:0xc07aba5664322f24, ext:123112042418, loc:(*time.Location)(0x55f6ac9f4ee0)}, Meta:null, Fields:{"agent":{"ephemera
l_id":"e1a17184-4daa-45b6-a4db-ae06a9f042b7","hostname":"gr-central-prod-backend06","id":"514bfcb6-f987-4ce9-9867-86522b6a86cd","name":"gr-central-prod-backend06","type":"metricbeat","version":"7.13.1"},"ecs":{"version":"1.9.0"},"event":{"dataset":"system.diskio","duration":610911
,"module":"system"},"fields":{"env":"production"},"host":{"disk":{"read.bytes":0,"write.bytes":388341760},"name":"gr-central-prod-backend06"},"metricset":{"name":"diskio","period":30000},"service":{"type":"system"},"tags":["backend"]}, Private:interface {}(nil), TimeSeries:true},
Flags:0x0, Cache:publisher.EventCache{m:common.MapStr(nil)}} (status=400): {"type":"mapper_parsing_exception","reason":"failed to parse","caused_by":{"type":"illegal_argument_exception","reason":"Limit of total fields [1000] has been exceeded while adding new fields [2]"}}
2022-02-14T15:14:33.811Z        WARN    [elasticsearch] elasticsearch/client.go:408     Cannot index event publisher.Event{Content:beat.Event{Timestamp:time.Time{wall:0xc07aba566444ac11, ext:123113254007, loc:(*time.Location)(0x55f6ac9f4ee0)}, Meta:null, Fields:{"agent":{"ephemera
l_id":"e1a17184-4daa-45b6-a4db-ae06a9f042b7","hostname":"gr-central-prod-backend06","id":"514bfcb6-f987-4ce9-9867-86522b6a86cd","name":"gr-central-prod-backend06","type":"metricbeat","version":"7.13.1"},"ecs":{"version":"1.9.0"},"event":{"dataset":"system.network","duration":23067
50,"module":"system"},"fields":{"env":"production"},"host":{"name":"gr-central-prod-backend06","network":{"in":{"bytes":10503604304,"packets":8045067},"out":{"bytes":10188621484,"packets":6522114}}},"metricset":{"name":"network","period":30000},"service":{"type":"system"},"tags":[
"backend"]}, Private:interface {}(nil), TimeSeries:true}, Flags:0x0, Cache:publisher.EventCache{m:common.MapStr(nil)}} (status=400): {"type":"mapper_parsing_exception","reason":"failed to parse","caused_by":{"type":"illegal_argument_exception","reason":"Limit of total fields [1000
] has been exceeded while adding new fields [2]"}}

We tried to increase the index.mapping.total_fields.limit to e.g. 2000 but this did not help.

In the end, we had to revert back to version 7.2...

Please for your assistance.

Hello @cgnusr01 ,

What's the actual error message after increasing the value?

Is metricbeat set up to write to the default index?
Did you update the mappings at index level or at index template level?

Could you detail the setup you have?

The documents in the error seem to be from system module, rather than from kubernetes.pod metricset

Hello @Andrea_Spacca

What's the actual error message after increasing the value?

The actual error message in this case was:

{
  "took": 491,
  "timed_out": false,
  "_shards": {
    "total": 12,
    "successful": 11,
    "skipped": 11,
    "failed": 1,
    "failures": [
      {
        "shard": 0,
        "index": "metricbeat-kube-7.13.1-2022.02.14-000001",
        "node": "ti0MftEaQk2lV0VMglBfTA",
        "reason": {
          "type": "script_exception",
          "reason": "runtime error",
          "script_stack": [
            "org.elasticsearch.search.lookup.LeafDocLookup.get(LeafDocLookup.java:100)",
            "org.elasticsearch.search.lookup.LeafDocLookup.get(LeafDocLookup.java:28)",
            "doc['kubernetes.replicaset.replicas.desired'].empty ? false:\n    (doc['kubernetes.replicaset.replicas.ready'].empty? false: doc['kubernetes.replicaset.replicas.desired'].value!=doc['kubernetes.replicaset.replicas.ready'].value)\n    \n ",
            "    ^---- HERE"
          ],
          "script": "doc['kubernetes.replicaset.replicas.desired'].empty ? false: ...",
          "lang": "painless",
          "position": {
            "offset": 4,
            "start": 0,
            "end": 234
          },
          "caused_by": {
            "type": "illegal_argument_exception",
            "reason": "No field found for [kubernetes.replicaset.replicas.desired] in mapping"
          }
        }
      }
    ]
  },
  "hits": {
    "max_score": null,
    "hits": []
  }
}

Is metricbeat set up to write to the default index?

Yes

Did you update the mappings at index level or at index template level?

No, we did not interfered with these at all

The Kubernetes ConfigMap holding the Metribeat properties is this:

metricbeat.config.modules:
      path: ${path.config}/modules.d/*.yml
      reload.enabled: false
    processors:
      - add_cloud_metadata:
      - if:
          or:
            - equals.system.network.name: "ens3f2"
            - equals.system.network.name: "ens3f3"
            - equals.system.network.name: "bond1.741"
        then:
          - add_fields:
              fields:
                vlan: "741"
        else:
          - if:
              or:
                - equals.system.network.name: "ens3f4"
                - equals.system.network.name: "ens3f5"
                - equals.system.network.name: "bond2.751"
            then:
              - add_fields:
                  fields:
                    vlan: "751"
            else:
              - drop_event:
                  when:
                    has_fields: ['system.network.name']
      - drop_event:
          when:
            regexp:
              system.filesystem.mount_point: '^/(sys|cgroup|proc|dev|etc|host|lib|hostfs|run)($|/)'
    output.elasticsearch:
      hosts: ["172.28.162.21:9200","172.28.162.22:9200","172.28.162.23:9200"]
      loadbalance: true
      protocol: "https"
      username: "${ES_USERNAME}"
      password: "${ES_PWD}"
      ssl:
       certificate_authorities: ["/etc/elasticsearch-ca.pem"]
       verification_mode: "none"
  
    setup.template:
      name: 'metricbeat-kube-%{[agent.version]}'
      pattern: 'metricbeat-kube-%{[agent.version]}*'
      enabled: false
      settings:
        index.number_of_shards: 1
        index.number_of_replicas: 1
        index.codec: best_compression
 
    setup.ilm.enabled: true
    setup.ilm.policy_name: 'metricbeat-kube-%{[agent.version]}'
    #rollover_alias does not support variables (agent.version) due to https://github.com/elastic/beats/issues/12233 so we set it explicitly
    setup.ilm.rollover_alias: 'metricbeat-kube-7.13.1'
    tags: ["backend"]
    fields:
      env: ${ENVIRONMENT}

hi @cgnusr01

this error indicates that the limit on total fields in the mapping was solved.

It is rather an error in the ingestion pipeline: I cannot find this source in the beats default pipelines, did you create it?

you can try to change the pipeline code using the null safe operator (Operators: Reference | Painless Scripting Language [7.13] | Elastic):
doc.kubernetes?.replicaset?.replicas?.desired?.empty

@Andrea_Spacca thanks so much for your prompt response. We haven't created or changed anything in the default pipeline of Metricbeat from the Kubernetes module. And when we go back to 7.2 it simply works...

@cgnusr01
did you run metricbeat setup from the new version?

that should update the pipelines

@Andrea_Spacca actually we haven't. I see now that setup.template.enable = false in the ConfigMap. We will try it out.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.