Filebeat to parse mixed data (strings and json)

Hello,

First of all, I'm a new to filebeat, so I may say stupids things. Forgive me in advance.

I have to parse a log file that looks like :
2021-03-18 09:33:37,131 -- TYPE -- {"json1": "data", "json2": "data", "json3": "data"}

I would like to decode :

  • Timestamp
  • Type
  • Json data

I tried to add the paring in the filebeat.inputs of the filebeat.yml file

processors:
  - dissect:
      tokenizer: "%{timestamp} -- %{type} -- %{json}"
      field: "message"
      target_prefix: "ocr.response"
  - decode_json_fields:
      fields: ["ocr.response.json"]
      process_array: true
      max_depth: 1
      overwrite_keys: false
      target: "json"

But I don't succeed in replacing the timestamp in Kibana by the read timestamp in my log file.

Then I tried to write a module

{
  "description": "Pipeline for parsing ocr response logs",
  "processors": [
    {
      "grok": {
        "field": "message",
        "patterns": [
        "%{TIMESTAMP_ISO8601:ocr.response.timestamp} -- %{WORD:ocr.response.type} --     %{DATA:ocr.response.json}"
        ],
        "pattern_definitions": {
          "RESPONSE_TIME": "%{DAY} %{MONTH} %{MONTHDAY} %{TIME} %{YEAR}"
        },
        "ignore_missing": true
      }
    },
    {
      "json": {
        "field": "ocr.response.json",
        "target_field": "ocr.response.json_decoded"
      }
    },
    {
      "remove":{
        "field": "message"
      }
    },
    {
      "rename": {
        "field": "ocr.response.message1",
        "target_field": "ocr.response.message",
        "ignore_failure": true
      }
    },
    {
      "date": {
        "field": "ocr.response.timestamp",
        "target_field": "@timestamp",
        "formats": ["EEE MMM dd H:m:s yyyy", "EEE MMM dd H:m:s.SSSSSS yyyy"],
        "ignore_failure": true
      }
    },
    {
      "remove": {
        "field": "ocr.response.timestamp",
        "ignore_failure": true
      }
    }
  ],
  "on_failure" : [{
    "set" : {
      "field" : "response.message",
      "value" : "{{ _ingest.on_failure_message }}"
    }
  }]
}

Now, it's the json part that fails to be decoded.

Which is the best method for parsing a log file ?
What I am doing wrong ?

Regards,

Olivier

Hi @Olivier_Gerault, welcome to the Elastic community forums!

I think you are on the right track with trying to do the parsing in filebeat.yml. I think the only processor you need to add is the timestamp processor.

Shaunak

Hi @shaunak
Thanks for your reply.
I forgot to specify that I have Filebeat 6.8 in which the processor timestamp seams not available.

Olivier

Hello, still have the problem.
Any suggestion ?
I changed the format of the output, now it looks like :
02/Apr/2021:12:24:12 +200 -- TYPE -- {"json1": "data", "json2": "data", "json3": "data"}
which is understood by apache2 module (for another log file)

How can I tell FB that the timestamp found in the line is the one that has to be used instead of the timestamp when the line is read?

FB 6.8

Regards,

Olivier Gérault

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.