Filebeats auto-mapping of kubernetes labels causing big issues


(Raman Gupta) #1

Filebeats 6.4.2, with Kubernetes auto-discovery, is failing to index most of my Kubernetes logs, due to errors like this:

(status=400): {"type":"mapper_parsing_exception","reason":"object mapping for [kubernetes.labels.app] tried to parse field [app] as object, but found a concrete value"}

The full log message is:

2018-10-30T16:36:33.263Z	WARN	elasticsearch/client.go:520	Cannot index event publisher.Event{Content:beat.Event{Timestamp:time.Time{wall:0x30bf59ea, ext:63676514191, loc:(*time.Location)(nil)}, Meta:common.MapStr(nil), Fields:common.MapStr{"input":common.MapStr{"type":"docker"}, "kubernetes":common.MapStr{"replicaset":common.MapStr{"name":"echoheaders-5987c755c7"}, "labels":common.MapStr{"tier":"canary", "app":"echoheaders", "pod-template-hash":"5987c755c7"}, "pod":common.MapStr{"name":"echoheaders-5987c755c7-cvfdm"}, "node":common.MapStr{"name":"k8s-agentpool2-36147952-vmss000000"}, "container":common.MapStr{"name":"echoheaders"}, "namespace":"test"}, "beat":common.MapStr{"hostname":"filebeat-9bltm", "version":"6.4.2", "name":"filebeat-9bltm"}, "meta":common.MapStr{"cloud":common.MapStr{"instance_id":"aa62124d-cc24-44fd-b374-ebc6d78e81e5", "instance_name":"k8s-agentpool2-36147952-vmss_0", "machine_type":"Standard_DS5_v2_Promo", "region":"CanadaEast", "provider":"az"}}, "source":"/var/lib/docker/containers/827376aa048afe6ec0fde1bd524c3a58d8a679ecfa1fe2200d509a706888bb24/827376aa048afe6ec0fde1bd524c3a58d8a679ecfa1fe2200d509a706888bb24-json.log", "stream":"stdout", "prospector":common.MapStr{"type":"docker"}, "message":"10.5.1.3 - - [30/Oct/2018:16:36:31 +0000] \"GET /healthz HTTP/1.1\" 200 436 \"-\" \"kube-probe/1.12\"", "offset":49925555, "host":common.MapStr{"name":"filebeat-9bltm"}}, Private:file.State{Id:"", Finished:false, Fileinfo:(*os.fileStat)(0xc420434820), Source:"/var/lib/docker/containers/827376aa048afe6ec0fde1bd524c3a58d8a679ecfa1fe2200d509a706888bb24/827376aa048afe6ec0fde1bd524c3a58d8a679ecfa1fe2200d509a706888bb24-json.log", Offset:49925727, Timestamp:time.Time{wall:0xbeee3c94a77577c5, ext:3049220432, loc:(*time.Location)(0x1f61860)}, TTL:-1, Type:"docker", Meta:map[string]string(nil), FileStateOS:file.StateOS{Inode:0x17a225, Device:0x801}}}, Flags:0x1} (status=400): {"type":"mapper_parsing_exception","reason":"object mapping for [kubernetes.labels.app] tried to parse field [app] as object, but found a concrete value"}

Almost all of my containers have an app label so this is preventing log ingestion on a massive scale.

I checked the index mapping and as far as I can tell, there shouldn't be an issue -- the kubernetes.labels.app field is a keyword type, not an object type. The specific mapping is:

"app": {
    "ignore_above": 1024,
    "type": "keyword"
},

Here is the entire mapping: https://pastebin.com/raw/ySZv1MMK

UPDATE: Ok, I realized I was looking at yesterday's index mapping. Today's mapping does contain an object structure, which explains the error:

"app": {
    "properties": {
        "kubernetes": {
        "properties": {
            "io/name": {
            "type": "keyword",
            "ignore_above": 1024
            },
            "io/part-of": {
            "type": "keyword",
            "ignore_above": 1024
            }
        }
        }
    }
},

The default index template I exported from filebeat doesn't have this field. I suspect this was "auto-mapped" because one of my pods has these labels:

"app.kubernetes.io/name": "default-http-backend",
"app.kubernetes.io/part-of": "ingress-nginx",

This seems amazingly brittle and error-prone for label keys to be parsed in this way. How can I prevent this? A processor field rename mapping?

Secondly, I'd be happy if the fields that didn't match were just ignored (with an annotation like logstash creates), while the rest of the content is indexed. Dropping the entire message because of one random field in the middle seems nuts for this use case.

Regards,
Raman


(Raman Gupta) #2

I used a processor field drop to drop the offending fields (https://www.elastic.co/guide/en/beats/filebeat/current/drop-fields.html) and that seems to have worked around the problem.

UPDATE: No, sometimes it works and sometimes it doesn't. Honestly I have no idea WTF is going on. In some cases I see the error:

2018-10-31T00:08:30.229Z	WARN	elasticsearch/client.go:520	Cannot index event publisher.Event{Content:beat.Event{Timestamp:time.Time{wall:0x2d84f440, ext:63676541307, loc:(*time.Location)(nil)}, Meta:common.MapStr(nil), Fields:common.MapStr{"meta":common.MapStr{"cloud":common.MapStr{"region":"CanadaEast", "instance_id":"8ff61563-9fb6-4194-80e8-e90f66195a75", "provider":"az", "instance_name":"k8s-agentpool1-36147952-vmss_3", "machine_type":"Standard_DS3_v2_Promo"}}, "offset":5362740, "kubernetes":common.MapStr{"pod":common.MapStr{"name":"nginx-ingress-controller-866568b999-6jf4n"}, "node":common.MapStr{"name":"k8s-agentpool1-36147952-vmss000003"}, "container":common.MapStr{"name":"nginx-ingress-controller"}, "namespace":"ingress-nginx", "replicaset":common.MapStr{"name":"nginx-ingress-controller-866568b999"}, "labels":common.MapStr{"pod-template-hash":"866568b999", "app":common.MapStr{"kubernetes":common.MapStr{"io/part-of":"ingress-nginx", "io/name":"ingress-nginx"}}}}, "host":common.MapStr{"name":"filebeat-4jlpm"}, "input":common.MapStr{"type":"docker"}, "prospector":common.MapStr{"type":"docker"}, "beat":common.MapStr{"name":"filebeat-4jlpm", "hostname":"filebeat-4jlpm", "version":"6.4.2"}, "source":"/var/lib/docker/containers/9040d6696a05bd715c7d6ee67cef66ad358ce1310bb43a630928686b336b5074/9040d6696a05bd715c7d6ee67cef66ad358ce1310bb43a630928686b336b5074-json.log", "stream":"stderr", "message":"I1031 00:08:27.763128       7 controller.go:953] Ingress \"default/default-http-backend-ingress\" defines a backend but no rule. Using it to configure the catch-all server \"_\""}, Private:file.State{Id:"", Finished:false, Fileinfo:(*os.fileStat)(0xc420508dd0), Source:"/var/lib/docker/containers/9040d6696a05bd715c7d6ee67cef66ad358ce1310bb43a630928686b336b5074/9040d6696a05bd715c7d6ee67cef66ad358ce1310bb43a630928686b336b5074-json.log", Offset:5362985, Timestamp:time.Time{wall:0xbeee5a5f6515de7a, ext:1510612698, loc:(*time.Location)(0x1f61860)}, TTL:-1, Type:"docker", Meta:map[string]string(nil), FileStateOS:file.StateOS{Inode:0xfa1ad, Device:0x801}}}, Flags:0x1} (status=400): {"type":"mapper_parsing_exception","reason":"failed to parse [kubernetes.labels.app]","caused_by":{"type":"illegal_state_exception","reason":"Can't get text on a START_OBJECT at 1:408"}}

This is my processor definition:

    processors:
      - add_cloud_metadata: ~
      # - rename:
      #   fields:
      #     - from: "app.kubernetes.io/name"
      #       to: "app"
      #     - from: "app.kubernetes.io/part-of"
      #       to: "part.of"
      #   ignore_missing: true
      #   fail_on_error: false
      - drop_fields:
          fields: ["app.kubernetes.io/name", "app.kubernetes.io/part-of"]

I can't get rename working either, but I created a separate post for that.

Right now I'd just be happy with a consistent workaround, but I'd also still be interested in knowing how to just ignored fields that don't map properly (with an annotation like logstash creates), while the rest of the content is indexed.

Then one could just periodically search for the annotation, figure out from the annotation which field did not map, and add the appropriate drop or rename actions to the processor. In the meantime, almost all of the data will be available and searchable.


(Jaime Soriano) #3

Hi @rocketraman,

I am sorry to read that you are having these problems, we are thinking on ways to improve this in the future.

By now actually you can workaround this issue using drop_fields and/or rename processors, but take into account that you need to use the full field name (e.g. kubernetes.labels.app.kubernetes.io/name instead of just app.kubernetes.io/name).


(Raman Gupta) #4

@jsoriano Thanks, I've modified the configuration accordingly, but still get the error I mentioned previously:

(status=400): {"type":"mapper_parsing_exception","reason":"failed to parse [kubernetes.labels.app]","caused_by":{"type":"illegal_state_exception","reason":"Can't get text on a START_OBJECT at 1:1089"}}

See my post above for the full error.

My processors value is now:

    processors:
      - add_cloud_metadata: ~
      - rename:
          fields:
            - from: "kubernetes.labels.app.kubernetes.io/name"
              to: "kubernetes.labels.app"
            - from: "kubernetes.labels.app.kubernetes.io/part-of"
              to: "kubernetes.labels.part-of"
          ignore_missing: true
          fail_on_error: false
      #- drop_fields:
      #    fields: ["kubernetes.labels.app.kubernetes.io/name", "kubernetes.labels.app.kubernetes.io/part-of"]

I have also tried drop_fields (commented out above), and get the same error.


(Jaime Soriano) #5

Is it possible that your kubernetes objects contain more annotations starting with app?


(Raman Gupta) #6

I've dumped the json of every pod in the cluster and have the following observations:

  1. There is no pod with a label "app" or ending with app that does not contain textual data:
kubectl get pods --all-namespaces -o json | grep -E 'app": [^"]'
<no output>
  1. The only pods that have a compound name with app that might be interpreted as objects have the same labels we have already considered (though there are two such pods):
kubectl get pods --all-namespaces -o json | grep -E "app\..*"
                    "app.kubernetes.io/name": "default-http-backend",
                    "app.kubernetes.io/part-of": "ingress-nginx",
                    "app.kubernetes.io/name": "ingress-nginx",
                    "app.kubernetes.io/part-of": "ingress-nginx",

Is it possible that the processor is trying to do some parsing of "app.kubernetes.io/x" before it gets to the drop/rename stage, and failing there?


(Jaime Soriano) #7

I guess that you still have something in your index or your mappings for kubernetes.labels.app, you could try another name in the rename to avoid possible conflicts there, something like:

      - rename:
          fields:
            - from: 'kubernetes.labels.app.kubernetes.io/name'
              to: 'kubernetes.labels.app_name'
            - from: 'kubernetes.labels.app.kubernetes.io/part-of'
              to: 'kubernetes.labels.app_part_of'
          ignore_missing: true
          fail_on_error: false
      - drop_fields:
          fields:
            - 'kubernetes.labels.app'

Is it possible that the processor is trying to do some parsing of "app.kubernetes.io/x" before it gets to the drop/rename stage, and failing there?

I don't think so, the processors are executed by filebeat before sending the event, but the error you are receiving looks like an error from elasticsearch.


(Raman Gupta) #9

I tried another configuration just to try and understand what is happening -- I tried renaming kubernetes.labels.app as well:

    processors:
      - add_cloud_metadata: ~
      - rename:
          fields:
            - from: 'kubernetes.labels.app.kubernetes.io/name'
              to: 'kubernetes.labels.app_name'
            - from: 'kubernetes.labels.app.kubernetes.io/part-of'
              to: 'kubernetes.labels.app_part_of'
            - from: 'kubernetes.labels.app'
              to: 'kubernetes.labels.app_name'
          ignore_missing: true
          fail_on_error: false
      #- drop_fields:
      #    fields:
      #      - 'kubernetes.labels.app'

And one can see that in the WARN message from filebeat, the structure still contains the app field which has an empty kubernetes object field which is left behind after its member fields are renamed:

common.MapStr{"app_name":"ingress-nginx", "app_part_of":"ingress-nginx", "pod-template-hash":"866568b999", "app":common.MapStr{"kubernetes":common.MapStr{}}}

So the final solution is kind of ridiculous, but works:

    processors:
      - add_cloud_metadata: ~
      - rename:
          when:
            has_fields: ['kubernetes.labels.app.kubernetes.io/name']
          fields:
            - from: 'kubernetes.labels.app'
              to: 'kubernetes.labels.appobject'
          ignore_missing: true
          fail_on_error: false
      - rename:
          when:
            has_fields: ['kubernetes.labels.appobject']
          fields:
            - from: 'kubernetes.labels.appobject.kubernetes.io/name'
              to: 'kubernetes.labels.app'
            - from: 'kubernetes.labels.appobject.kubernetes.io/part-of'
              to: 'kubernetes.labels.part-of'
          ignore_missing: true
          fail_on_error: false
      - drop_fields:
          when:
            has_fields: ['kubernetes.labels.appobject']
          fields:
            - 'kubernetes.labels.appobject'

(system) #10

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.