Grok parse does not work, but looks fine on the debugger

ELK: 7.71
On the debugger grok looks good, but when I send filebeat->logstash->elasticsearch, on kibana none of the fields I mapped is there. Any on has an idea?

I have the following log structure:

2020-06-18T23:10:43.377Z [pid:#PID<0.474.0>, application: :logster, request_id: "Fhm_oW8UBaB9RngAAABM"] [info] {"action":"get","controller":"Web.PlayersController","duration":51.271,"format":"json","method":"POST","params":{"player_id":"b5c0c249-1e78-4e14-b3a2-ad283a358ec1"},"path":"/api/players","state":"set","status":200}

My filebeat has:

    - type: log

      enabled: true

       - /tmp/logs/info.log

My logstash.conf filter is:

    filter {
        if [fileset][name] == "log" {
          grok {
            match => { "message" => ["%{TIMESTAMP_ISO8601:timestamp} \[pid:#PID<%{NOTSPACE:pid}>, application: :%{GREEDYDATA:application}, request_id: \"%{GREEDYDATA:request_id}\"\] \[%{LOGLEVEL:level}\] {%{DATA:json_data}}"] }
            remove_field => "message"
          }
          date {
            match => [ "timestamp", "TIMESTAMP_ISO8601"]
          }
          json {
            source => "json_data"
            target => "log"
          }
        }
    }

Filebeat processor:

    2020-06-27T23:25:41.542+0200	DEBUG	[processors]	processing/processors.go:187	Publish event: {
      "@timestamp": "2020-06-27T21:25:41.541Z",
      "@metadata": {
        "beat": "filebeat",
        "type": "_doc",
        "version": "7.7.1"
      },
      "agent": {
        "version": "7.7.1",
        "type": "filebeat",
        "ephemeral_id": "68977da9-51f1-4805-b1b8-77a79154b964",
        "hostname": "**********",
        "id": "fa178769-e8ed-49b9-857b-1f9952306585"
      },
      "ecs": {
        "version": "1.5.0"
      },
      "host": {
        "architecture": "x86_64",
        "os": {
          "name": "*******",
          "kernel": "********",
          "build": "*******",
          "platform": "darwin",
          "version": "******",
          "family": "darwin"
        },
        "id": "1ACB892B-00D1-5EB0-BA1D-E4C63D269CD8",
        "ip": [
    *******
        ],
        "name": "*******",
        "mac": [
          ******
        ],
        "hostname": "*******"
      },
      "log": {
        "file": {
          "path": "/tmp/logs/info.log"
        },
        "offset": 536
      },
      "message": "2020-06-26T17:44:11.731Z [pid: #PID<0.474.0>, application: :logster, request_id: \"FhwidG1_g6h9RngAAAAB\"] [info] {\"action\":\"get\",\"controller\":\"Web.PlayersController\",\"duration\":101.043,\"format\":\"json\",\"method\":\"POST\",\"params\":{\"player_id\":\"664981e8-293c-42e7-9b7c-5312827600af\"},\"path\":\"/api/players\",\"state\":\"set\",\"status\":200}",
      "input": {
        "type": "log"
      }
    }

Hello perhaps you meant this line by [input][type] == 'log' ?

No, because before I have no condition and still did not get any of the fields

You need to include the { and } in the [json_data] field. Also, remove the TIMESTAMP_ from the date filter pattern.

input { generator { count => 1 lines => [ '2020-06-18T23:10:43.377Z [pid:#PID<0.474.0>, application: :logster, request_id: "Fhm_oW8UBaB9RngAAABM"] [info] {"action":"get","controller":"Web.PlayersController","duration":51.271,"format":"json","method":"POST","params":{"player_id":"b5c0c249-1e78-4e14-b3a2-ad283a358ec1"},"path":"/api/players","state":"set","status":200}' ] } }
filter {
      grok {
        match => { "message" => ["%{TIMESTAMP_ISO8601:timestamp} \[pid:#PID<%{NOTSPACE:pid}>, application: :%{GREEDYDATA:application}, request_id: \"%{GREEDYDATA:request_id}\"\] \[%{LOGLEVEL:level}\] %{GREEDYDATA:json_data}"] }
        remove_field => "message"
      }
      date { match => [ "timestamp", "ISO8601"] }
      json { source => "json_data" target => "log" }
}

works...

 "@timestamp" => 2020-06-18T23:10:43.377Z,
"application" => "logster",
        "log" => {
    "controller" => "Web.PlayersController",
        "status" => 200,
        "params" => {
        "player_id" => "b5c0c249-1e78-4e14-b3a2-ad283a358ec1"
    },
      "duration" => 51.271,
          "path" => "/api/players",
        "action" => "get",
         "state" => "set",
        "format" => "json",
        "method" => "POST"
},
 "request_id" => "Fhm_oW8UBaB9RngAAABM",

etc.

Not sure what I'm doing wrong, but the fields does not show on kibana, is there a change of filebeat processors overwriting the fields? How can I correctly expose the new fields?

Are you sure the events are passing through logstash and that you are not doing filebeat->elasticsearch?

Perhaps configure logstash with

output { stdout { codec => rubydebug } }

and see if the message is getting parsed. In kibana, is the [message] field removed? It would be if the grok matched. Do you have a _grokparsefailure tag?

I do have _grokparsefailure and beats_input_codec_plain_applied

Here my logstash config:

    input {
      beats {
        port => 5044
      }
    }
    filter {
        grok {
            match => { "message" => ["%{TIMESTAMP_ISO8601:timestamp} \[pid:#PID<%{NOTSPACE:pid}>, application: :%{GREEDYDATA:application}, request_id: \"%{GREEDYDATA:request_id}\"\] \[%{LOGLEVEL:level}\] %{GREEDYDATA:json_data}"] }
            remove_field => "message"
        }
        date { match => [ "timestamp", "ISO8601"] }
        json { source => "json_data" target => "log" }

    }
    output {
      elasticsearch {
        hosts => ["localhost"]
        manage_template => false
        index => "%{[@metadata][beat]}-%{+YYYY.MM.dd}"
      }
    }

In kibana, is the [message] field removed? => No
My filebeat.yml is:

filebeat.inputs:
- type: log
  enabled: true
  paths:
    - /tmp/logs/info.log

filebeat.config.modules:
  path: ${path.config}/modules.d/*.yml

  reload.enabled: false

setup.template.settings:
  index.number_of_shards: 1
  template:
    pattern: "filebeat-*"

setup.dashboards.enabled: true

setup.dashboards.index: "filebeat-*"

setup.kibana:

output.logstash:
  hosts: ["localhost:5044"]

processors:
  - add_host_metadata: ~

logging.selectors: ["*"]

Holy moly, I found a missing %{SPACE} on the pid:

I also had diff format like this:

2020-06-26T17:43:20.780Z [pid: #PID<0.337.0>, application: :phoenix] [info] Running Web.Endpoint with cowboy 2.8.0 at 0.0.0.0:4000 (http)

2020-06-26T17:43:20.796Z [pid: #PID<0.323.0>, application: :phoenix] [info] Access Web.Endpoint at http://localhost:4000

2020-06-26T17:44:11.632Z [pid: #PID<0.474.0>, application: :phoenix, request_id: "FhwidG1_g6h9RngAAAAB"] [info] POST /api/players

2020-06-26T17:44:11.730Z [pid: #PID<0.474.0>, application: :phoenix, request_id: "FhwidG1_g6h9RngAAAAB"] [info] Sent 200 in 97ms

Now I have the fields, only the json_data parsing properly.

How can I convert the json in fields? And how can I parse multiple different lines like on the example above?

That's really funny. When I first looked at your initial post I noticed the difference in the space between in the pid# parts of the pattern and the sample data (copied from the beats section), but when I started replying I looked at the sample message at the top of your post and there was no space, so I thought you must have edited it :smiley:

If you have multiple formats some of which contain JSON I would do something like

grok {
    match => { "message" => ["%{TIMESTAMP_ISO8601:timestamp} \[pid:( )?#PID<%{NOTSPACE:pid}>, application: :%{GREEDYDATA:application}, request_id: \"%{GREEDYDATA:request_id}\"\] \[%{LOGLEVEL:level}\] %{GREEDYDATA:restOfLine}"] }
    remove_field => "message"
}
date { match => [ "timestamp", "ISO8601"] }
json { source => "restOfLine json_data" target => "log" remove_field=> "restOfLine" skip_on_invalid_json=> true }

so that if there is JSON it gets parsed otherwise the message is left in the restOfLine field (however you want to name it).

Note the ( )? to make the space in the pid optional.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.