Filebeat json.message_key tries to parse string as json

Hello,

When trying to send logs from my kubernetes cluster on AKS running Filebeat 7.5, I'm running into the following error messages:

    Error decoding JSON: invalid character 'N' looking for beginning of value
    Error decoding JSON: invalid character '3' after array element
    Error decoding JSON: json: cannot unmarshal number into Go value of type map[string]interface {}
    Error decoding JSON: invalid character 'T' looking for beginning of value

My logs are formatted in the following way and are actually multiline logs.

    {"log":"[03-02 10:30:41.785] DEBUG station-xapi-httpListenerConfig [http.listener.02 SelectorRunner] [event: ]: LISTENER\n","stream":"stdout","time":"2021-03-02T10:30:41.858565487Z"}
    {"log":"0\r\n","stream":"stdout","time":"2021-03-02T10:30:41.858570287Z"}
    {"log":"\r\n","stream":"stdout","time":"2021-03-02T10:30:41.858575087Z"}
    {"log":"\n","stream":"stdout","time":"2021-03-02T10:30:41.858580387Z"}
    {"log":"[03-02 10:30:52.528] DEBUG station-xapi-httpListenerConfig [http.listener.01 SelectorRunner] [event: b52f3b20-7535-11eb-b9dc-8e475bcaae00]: LISTENER\n","stream":"stdout","time":"2021-03-02T10:30:52.575037037Z"}
    {"log":"GET / HTTP/1.1\r\n","stream":"stdout","time":"2021-03-02T10:30:52.575085938Z"}
    {"log":"Host: contoso.net\r\n","stream":"stdout","time":"2021-03-02T10:30:52.575091438Z"}
    {"log":"\r\n","stream":"stdout","time":"2021-03-02T10:30:52.575094838Z"}
    {"log":"\n","stream":"stdout","time":"2021-03-02T10:30:52.575098038Z"}
    {"log":"[03-02 10:30:52.529] DEBUG station-xapi-httpListenerConfig [http.listener.01 SelectorRunner] [event: b52f3b20-7535-11eb-b9dc-8e475bcaae00]: LISTENER\n","stream":"stdout","time":"2021-03-02T10:30:52.575101238Z"}
    {"log":"HTTP/1.1 404 Not Found\r\n","stream":"stdout","time":"2021-03-02T10:30:52.575104638Z"}
    {"log":"Content-Type: text/plain\r\n","stream":"stdout","time":"2021-03-02T10:30:52.575108638Z"}
    {"log":"Date: Tue, 02 Mar 2021 10:30:52 GMT\r\n","stream":"stdout","time":"2021-03-02T10:30:52.575112338Z"}
    {"log":"Transfer-Encoding: chunked\r\n","stream":"stdout","time":"2021-03-02T10:30:52.575115538Z"}
    {"log":"\r\n","stream":"stdout","time":"2021-03-02T10:30:52.575119338Z"}
    {"log":"1b\r\n","stream":"stdout","time":"2021-03-02T10:30:52.575122538Z"}
    {"log":"No listener for endpoint: /\r\n","stream":"stdout","time":"2021-03-02T10:30:52.575125738Z"}
    {"log":"\n","stream":"stdout","time":"2021-03-02T10:30:52.575128738Z"}
    {"log":"[03-02 10:30:52.529] DEBUG station-xapi-httpListenerConfig [http.listener.01 SelectorRunner] [event: b52f3b20-7535-11eb-b9dc-8e475bcaae00]: LISTENER\n","stream":"stdout","time":"2021-03-02T10:30:52.575131739Z"}
    {"log":"0\r\n","stream":"stdout","time":"2021-03-02T10:30:52.575135039Z"}
    {"log":"\r\n","stream":"stdout","time":"2021-03-02T10:30:52.575137939Z"}
    {"log":"\n","stream":"stdout","time":"2021-03-02T10:30:52.575140939Z"}

Even without any multiline config it is throwing above errors. It seems that the json command tries to parse also whatever is inside the "log" field as json or something ? What else can I try ? To be clear, I have no control over above format (not even what is inside the "log" field).

My relevant part of the filebeat config looks like this:

    filebeat.autodiscover:
      providers:
          - type: kubernetes
            host: ${NODE_NAME}
            templates:
              - condition:
                  contains:
                    kubernetes.container.name: "app"
                    kubernetes.namespace: "d6023b10-b4b6-4c0c-990c-70d85d507fe9"
                config:
                  - type: docker
                    containers.ids:
                      - "${data.kubernetes.container.id}"
                    json.keys_under_root: true
                    json.add_error_key: true
                    json.message_key: log
                    #json.max_depth:
                    fields:
                      env: dev
                    fields_under_root: true
                    #multiline.pattern: '^\[[0-9]{2}-[0-9]{2}\s+[0-9]{2}:[0-9]{2}:[0-9]{2}.[0-9]{3}\]\s+(INFO|DEBUG|WARN|ERROR|TRACE).*$'
                    #multiline.negate: true
                    #multiline.match: after

Thanks in advance!

Could you please post original logs (before parsing)? What you've posted are actually different log lines that can't be combined into a single log message.

Hi Marcin,

So it's not a possibility to combine everything inside multiple "log" fields with multiline ? I was under the impression it should work the same as the non-json parser (aka reading until the multiline pattern finds the next line).

This is before parsing, this is how the json-file driver that is used by Azure aks logs to the container log files. The pattern [03-02 10:30:52.529] DEBUG indicates a new log entry, so if the "log" field starts with that, it should create a new log entry and close the previous one.

And also why does it complain when I ask it to read the log field when i dont have multiline enabled ? It should be a simple "read string and push to elasticsearch"

No, you're wrong. Multiline option operates on a different level - first chop the input stream into log messsages, then apply other config options.

See: Manage multiline messages | Filebeat Reference [7.11] | Elastic

It doesn't say anything about JSON logs. You can try to convert this log file into a standard plain text file and the apply further processing. With JSON logs, you're using that "log" key already contains a potentially multiline message.

Hi Marcin,

I'm referring to the documentation here: Container input | Filebeat Reference [7.11] | Elastic

This states that you can apply multiline after json with using message_key to indicate where the log line is inside the json line. That's what I'm trying to achieve here and it sais the only requirement is that it is 1 json object per line which is the case.

But even if we put the multiline discussion aside, having just :

json.keys_under_root: true
json.add_error_key: true
json.message_key: log

in my config, should theoretically send whatever is inside the "log" field towards my output and it's not doing that. If you look at the error messages, it really looks like he tries to decode the "log" field itself as json, which i don't understand. i thought using "json.something" was to indicate that he needs to interpret the log line as json, so that he can use the message_key to extract the correct field where the real log is.

The json processor has something called max_depth, but I don't see it in above documentation link for container/docker json input so I guess for json input this is not supported. I am assuming that the json input does not have max_depth=1 by default and that can potentially explain my error messages since there are characters in my logs that indicate start/end of json array.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.