Parse JSON logs from only certain Kubernetes deployments

benjamingorman · November 27, 2018, 3:58pm

Hi!

I have a question about parsing JSON log messages produced by Kubernetes deployments. I've already seen this thread, this page in the docs, and this page.

None of those seem relevant however. The problem is that some apps in our cluster log in JSON and some don't. So I need a way to tell filebeat to only try to parse JSON logs for a certain set of apps.

I tried adding this config to filebeat.yml however it leads to errors parsing log messages which aren't JSON-formatted:

 - decode_json_fields:
    fields: ["message"]
    process_array: true
    max_depth: 10
    overwrite_keys: false

Here are the ConfigMaps for our filebeat setup:

apiVersion: v1
kind: ConfigMap
metadata:
  name: filebeat-config
  namespace: kube-system
  labels:
    k8s-app: filebeat
data:
  filebeat.yml: |-
    filebeat.config:
      inputs:
        # Mounted `filebeat-inputs` configmap:
        path: /usr/share/filebeat/inputs.d/*.yml
        # Reload inputs configs as they change:
        reload.enabled: false
      modules:
        path: /usr/share/filebeat/modules.d/*.yml
        # Reload module configs as they change:
        reload.enabled: false

    output.elasticsearch:
      hosts: ['...:9200']
      username: ...
      password: ...
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: filebeat-inputs
  namespace: kube-system
  labels:
    k8s-app: filebeat
data:
  kubernetes.yml: |-
    - type: docker
      containers.ids:
      - "*"
      processors:
        - add_kubernetes_metadata:
            in_cluster: true

Is there a way that I can specify different processors depending on the source container/app? That would solve the problem because then I could add the JSON processor only for certain apps.

benjamingorman · November 27, 2018, 4:21pm

I've realized that it's possible to add annotations on pods as hints to filebeat (https://www.elastic.co/guide/en/beats/filebeat/6.4/configuration-autodiscover-hints.html).

I've tried adding some configuration like this:

    filebeat.autodiscover:
      providers:
        - type: kubernetes
          hints.enabled: true
          include_annotations: ["json_logs"]
          templates:
            - condition:
              contains:
                  kubernetes.annotations.json_logs: "true"
              config:
              - processors:
                  decode_json_fields:
                    fields: ["message"]

This isn't working (seems to have no effect at all) however it's a step in the right direction. If I can just add an annotation to each pod that logs in JSON format, and get filebeat to recognize that, then this will be a perfect solution.

benjamingorman · November 27, 2018, 5:39pm

I've enabled autodiscover debug logs (running filebeat with -d autodiscover) and I'm seeing this in the logs:

2018-11-27T17:05:49.719Z        DEBUG   [autodiscover]  autodiscover/autodiscover.go:162        Got a start event: map[port:8080 kubernetes:{"annotations":{"json_logs":"true", ... REDACTED ... }}
2018-11-27T17:05:38.746Z        DEBUG   [autodiscover]  autodiscover/autodiscover.go:235        Got a meta field in the event
2018-11-27T17:05:38.746Z        DEBUG   [autodiscover]  cfgfile/list.go:62      Starting reload procedure, current runners: 11
2018-11-27T17:05:38.746Z        DEBUG   [autodiscover]  cfgfile/list.go:80      Start list: 2, Stop list: 0
2018-11-27T17:05:38.746Z        ERROR   [autodiscover]  cfgfile/list.go:96      Error creating runner from config: No paths were defined for input accessing config
2018-11-27T17:05:38.746Z        ERROR   [autodiscover]  cfgfile/list.go:96      Error creating runner from config: No paths were defined for input accessing config

Any suggestions as to why this error might occur?

jsoriano · November 30, 2018, 3:07pm

Hi @benjamingorman and welcome

I think you are pretty close to making it work. There are only a couple of issues with your autodiscover configuration.

First one can be an issue with copying the config here, but note that the template condition must have one more level of indentation, like this:

            - condition:
                contains:
                  kubernetes.annotations.json_logs: "true"

Second one is that the configuration provided in the template is incomplete, you need to provide a whole input configuration, by default filebeat tries to use log input, but paths are missing for that, this is why you see these errors.

For this case you probably want to use a docker input, and then the configuration would look like:

filebeat.autodiscover:
  providers:
    - type: kubernetes
      templates:
        - condition:
            equals:
              kubernetes.annotations.json_logs: "true"
          config:
            - type: docker
              containers.ids:
                - "${data.kubernetes.container.id}"
              processors:
                decode_json_fields:
                  fields: ["message"]

You can find more information about the kubernetes autodiscover provider in the documentation.

I hope this helps.

benjamingorman · December 3, 2018, 11:09am

@jsoriano thanks for the response - that's really helpful.

I've replaced my config with your suggestions. I'm still seeing some errors in filebeat's log though.
Any suggestions as to how to go about debugging this?

2018-12-03T10:49:58.334Z	DEBUG	[autodiscover]	autodiscover/autodiscover.go:136	Reloading existing autodiscover configs after error
2018-12-03T10:49:58.334Z	DEBUG	[autodiscover]	cfgfile/list.go:62	Starting reload procedure, current runners: 21
2018-12-03T10:49:58.334Z	ERROR	[autodiscover]	cfgfile/list.go:96	Error creating runner from config: No paths were defined for input accessing config

I'll post my entire filebeat configmap again:

---
apiVersion: v1
kind: ConfigMap
metadata:
  name: filebeat-config
  namespace: kube-system
  labels:
    k8s-app: filebeat
data:
  filebeat.yml: |-
    filebeat.config:
      inputs:
        # Mounted `filebeat-inputs` configmap:
        path: /usr/share/filebeat/inputs.d/*.yml
        # Reload inputs configs as they change:
        reload.enabled: false
      modules:
        path: /usr/share/filebeat/modules.d/*.yml
        # Reload module configs as they change:
        reload.enabled: false

    filebeat.autodiscover:
      providers:
        - type: kubernetes
          templates:
            - condition:
                equals:
                  kubernetes.annotations.json_logs: "true"
              config:
                - type: docker
                  containers.ids:
                    - "${data.kubernetes.container.id}"
                  processors:
                    decode_json_fields:
                      fields: ["message"]
                      process_array: true
                      max_depth: 10
                      overwrite_keys: false
                    add_kubernetes_metadata:
                      in_cluster: true

    cloud.id: ""
    cloud.auth: ""

    output.elasticsearch:
      hosts: ['REDACTED:9200']
      username: ...
      password: ...

jsoriano · December 3, 2018, 11:40am

Sorry, there was a typo in my suggested config, processors must be a list, it'd be:

                  processors:
                    - decode_json_fields:
                        fields: ["message"]
                        process_array: true
                        max_depth: 10
                        overwrite_keys: false
                    - add_kubernetes_metadata:
                        in_cluster: true

The error seems to point to the lack of a path in some input config, this can happen if the log input is used, but it doesn't seem to be the case here, are you also adding modules configuration files?

benjamingorman · December 3, 2018, 1:34pm

@jsoriano after removing the filebeat.config.inputs and filebeat.config.modules sections of my config then the problem seems to have resolved itself.

With the typo in the autodiscover config fixed too I can see that filebeat is taking logs from my app and attempting to use the JSON processor which is great!

However there's still an issue which seems to be a parse error.
Here's an example log message that my app is producing:

{"level":"info","msg":"GET /, response 404 Not Found","time":"2018-12-03T13:28:02Z"}

Filebeat seems to be unable to parse this as JSON - there's an error in the filebeat logs:

2018-12-03T13:28:08.151Z        WARN    elasticsearch/client.go:520     Cannot index event publisher.Event{Content:beat.Event{Timestamp:time.Time{wall:0x1fee06a7, ext:63679440482, loc:(*time.Location)(nil)}, Meta:common.MapStr(nil), Fields:common.MapStr{"stream":"stdout", "message":map[string]interface {}{"msg":"GET /, response 404 Not Found", "time":"2018-12-03T13:28:02Z", "level":"info"}, "prospector":common.MapStr{"type":"docker"}, "input":common.MapStr{"type":"docker"}, "beat":common.MapStr{"hostname":"filebeat-w4xbv", "version":"6.5.0", "name":"filebeat-w4xbv"}, "host":common.MapStr{"name":"filebeat-w4xbv"}, "source":"/var/lib/docker/containers/05d195f77a281969b217e1318f36b61123c627eccffd76133cd9ea34a91f1554/05d195f77a281969b217e1318f36b61123c627eccffd76133cd9ea34a91f1554-json.log", "offset":326, "kubernetes":common.MapStr{"namespace":"default", "replicaset":common.MapStr{"name":"REDACTED"}, "labels":common.MapStr{"app":"REDACTED", "pod-template-hash":"3803567199"}, "pod":common.MapStr{"name":"REDACTED"}, "node":common.MapStr{"name":"REDACTED"}, "container":common.MapStr{"name":"REDACTED"}}}, Private:file.State{Id:"", Finished:false, Fileinfo:(*os.fileStat)(0xc420020680), Source:"/var/lib/docker/containers/05d195f77a281969b217e1318f36b61123c627eccffd76133cd9ea34a91f1554/05d195f77a281969b217e1318f36b61123c627eccffd76133cd9ea34a91f1554-json.log", Offset:493, Timestamp:time.Time{wall:0xbef969b9c86ec4a1, ext:4462380904768, loc:(*time.Location)(0x20ed1e0)}, TTL:-1, Type:"docker", Meta:map[string]string(nil), FileStateOS:file.StateOS{Inode:0x2744cc, Device:0x801}}}, Flags:0x1} (status=400): {"type":"mapper_parsing_exception","reason":"failed to parse field [message] of type [text]","caused_by":{"type":"illegal_state_exception","reason":"Can't get text on a START_OBJECT at 1:208"}}

(The important bit seems to be {"type":"mapper_parsing_exception","reason":"failed to parse field [message] of type [text]","caused_by":{"type":"illegal_state_exception","reason":"Can't get text on a START_OBJECT at 1:208"})

Just googling around to see if this is a common issue.

benjamingorman · December 3, 2018, 2:11pm

Ah, the issue seems to be because the type of the message field in the Elasticsearch index I'm using is set to text rather than object. The solution I found was to use:

- decode_json_fields:
    fields: ["message"]
    target: "json_message"
    ...

Since the json_message field did not already exist in the index then this was fine.

I suppose another solution would be to change the type of the message field in the index to an object, and convert all existing records. I decided just to create a new field though.

@jsoriano thanks again for your help.

If anyone else is looking at this thread and having a similar issue then feel free to send me a message.

system · December 31, 2018, 2:11pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Decode json data from Kubernetes Pods Beats filebeat	2	2676	May 14, 2020
Its possible activate the parser a json format for specific pod? Beats filebeat	3	341	September 3, 2020
Filebeat not parsing json in messages Beats filebeat	10	6802	June 5, 2018
Kubernetes/Filebeat - How to Handle JSON Logging for some containers Beats filebeat	1	633	January 20, 2020
Parse json from selected pods only Beats docker , filebeat	2	740	February 20, 2020

Parse JSON logs from only certain Kubernetes deployments

Related topics