Filebeat to parse mixed data (strings and json)

Olivier_Gerault · March 18, 2021, 9:23am

Hello,

First of all, I'm a new to filebeat, so I may say stupids things. Forgive me in advance.

I have to parse a log file that looks like :
2021-03-18 09:33:37,131 -- TYPE -- {"json1": "data", "json2": "data", "json3": "data"}

I would like to decode :

Timestamp
Type
Json data

I tried to add the paring in the filebeat.inputs of the filebeat.yml file

processors:
  - dissect:
      tokenizer: "%{timestamp} -- %{type} -- %{json}"
      field: "message"
      target_prefix: "ocr.response"
  - decode_json_fields:
      fields: ["ocr.response.json"]
      process_array: true
      max_depth: 1
      overwrite_keys: false
      target: "json"

But I don't succeed in replacing the timestamp in Kibana by the read timestamp in my log file.

Then I tried to write a module

{
  "description": "Pipeline for parsing ocr response logs",
  "processors": [
    {
      "grok": {
        "field": "message",
        "patterns": [
        "%{TIMESTAMP_ISO8601:ocr.response.timestamp} -- %{WORD:ocr.response.type} --     %{DATA:ocr.response.json}"
        ],
        "pattern_definitions": {
          "RESPONSE_TIME": "%{DAY} %{MONTH} %{MONTHDAY} %{TIME} %{YEAR}"
        },
        "ignore_missing": true
      }
    },
    {
      "json": {
        "field": "ocr.response.json",
        "target_field": "ocr.response.json_decoded"
      }
    },
    {
      "remove":{
        "field": "message"
      }
    },
    {
      "rename": {
        "field": "ocr.response.message1",
        "target_field": "ocr.response.message",
        "ignore_failure": true
      }
    },
    {
      "date": {
        "field": "ocr.response.timestamp",
        "target_field": "@timestamp",
        "formats": ["EEE MMM dd H:m:s yyyy", "EEE MMM dd H:m:s.SSSSSS yyyy"],
        "ignore_failure": true
      }
    },
    {
      "remove": {
        "field": "ocr.response.timestamp",
        "ignore_failure": true
      }
    }
  ],
  "on_failure" : [{
    "set" : {
      "field" : "response.message",
      "value" : "{{ _ingest.on_failure_message }}"
    }
  }]
}

Now, it's the json part that fails to be decoded.

Which is the best method for parsing a log file ?
What I am doing wrong ?

Regards,

Olivier

shaunak · March 18, 2021, 8:47pm

Hi @Olivier_Gerault, welcome to the Elastic community forums!

I think you are on the right track with trying to do the parsing in filebeat.yml. I think the only processor you need to add is the timestamp processor.

Shaunak

Olivier_Gerault · March 19, 2021, 9:51am

Hi @shaunak
Thanks for your reply.
I forgot to specify that I have Filebeat 6.8 in which the processor timestamp seams not available.

Olivier

Olivier_Gerault · April 2, 2021, 10:34am

Hello, still have the problem.
Any suggestion ?
I changed the format of the output, now it looks like :
02/Apr/2021:12:24:12 +200 -- TYPE -- {"json1": "data", "json2": "data", "json3": "data"}
which is understood by apache2 module (for another log file)

How can I tell FB that the timestamp found in the line is the one that has to be used instead of the timestamp when the line is read?

FB 6.8

Regards,

Olivier Gérault

system · April 30, 2021, 12:35pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Parse json data from log file into Kibana via Filebeat and Logstash Beats filebeat	10	9602	May 19, 2020
Filebeat processors decode_json_fields with condition not working Beats filebeat	1	439	March 4, 2020
Filebeat to logstash problem to parse json message Beats filebeat	7	1933	January 10, 2018
Parse JSON data with filebeat Beats filebeat	8	60208	April 24, 2017
Need help with parsing json fields Logs	4	3713	April 15, 2019

Filebeat to parse mixed data (strings and json)

Related topics