Grokparse failure even grok debugger fine

Hi All,

i am facing the grokparsefailure for my logs even the grok debugger is showing all parsed data but logstash is failing for all fields. below is my filter of logstash

filter {
            grok {
                        match => { "message" => "%{IPORHOST:[source][address]} (?:-|%{HTTPDUSER:[access][user][identity]}) (?:-|%{HTTPDUSER:[user][name]}) \[%{HTTPDATE:timestamp}\] (%{HTTPDUSER:[user][name]})? \"(?:%{WORD:[http][request][method]} %{NOTSPACE:[url][original]}(?: HTTP/%{NUMBER:[http][version]})?|%{DATA})\" (?:-|%{INT:[http][response][status_code]:int}) (?:-|%{INT:[http][response][body][bytes]:int}bytes) \"(?:-|%{IPORHOST:[destination][address]})\" \"%{DATA:session}\" \[%{DATA:extra}\] (%{INT:time_taken:int})ms" }
                        remove_field => ["message"]
                     }
       }
output
        {
        stdout { codec => rubydebug }

and this is the output from debug modes.

         "event" => {
        "original" => "{\"@timestamp\":\"2023-08-03T08:29:04.402Z\",\"@metadata\":{\"beat\":\"filebeat\",\"type\":\"_doc\",\"version\":\"8.5.3\"},\"agent\":{\"ephemeral_id\":\"a6a96b92-5584-44ed-b629-dd19469.uat.dbs.com\",\"type\":\"filebeat\",\"version\":\"8.5.3\"},\"log\":{\"offset\":0,\"file\":{\"path\":\"/tmp/csec/httpd/access_log1.log\"}},\"message\":\"10.92.245.37 - - [12/Jul/2023:08:00:07 +0800] - \\\"GE] 0ms\",\"metadata\":{\"component_type\":\"csec_httpd_app\",\"timezone\":\"Asia/Singapore\",\"application\":\"CSEC-ENT-HD-ACC\"},\"topic\":\"testtopic2006\",\"input\":{\"type\":\"filestream\"},\"ecs\":{\"ver
    },
       "message" => "{\"@timestamp\":\"2023-08-03T08:29:04.402Z\",\"@metadata\":{\"beat\":\"filebeat\",\"type\":\"_doc\",\"version\":\"8.5.3\"},\"agent\":{\"ephemeral_id\":\"a6a96b92-5584-44ed-b629-dd1946928at.dbs.com\",\"type\":\"filebeat\",\"version\":\"8.5.3\"},\"log\":{\"offset\":0,\"file\":{\"path\":\"/tmp/csec/httpd/access_log1.log\"}},\"message\":\"10.92.245.37 - - [12/Jul/2023:08:00:07 +0800] - \\\"GET 0ms\",\"metadata\":{\"component_type\":\"csec_httpd_app\",\"timezone\":\"Asia/Singapore\",\"application\":\"CSEC-ENT-HD-ACC\"},\"topic\":\"testtopic2006\",\"input\":{\"type\":\"filestream\"},\"ecs\":{\"versi
      "@version" => "1",
    "@timestamp" => 2023-08-03T08:29:15.239531418Z,
          "tags" => [
        [0] "_grokparsefailure"
    ]
}

below is one of my sample logs data which worked well in grok debugger. since i have custom logs so grok pattern is designed like it . might be improve better but logstash currently is failing.

10.92.11.10 - - [12/Jul/2023:08:00:07 +0800] - "GET /isalive HTTP/1.1" 200 15bytes "-" "ZK3tBx_wNPwl3QRmANzTWgAAAA8" [-] 0ms```

here is also the grok debugger pattern good.,

Did you edit anything in this output you shared?

The value of event.original is different from the value of message, this does not happens.

In the event.original you have this:

10.92.245.37 - - [12/Jul/2023:08:00:07 +0800] - "GE] 0ms"

But in the message field you have this:

10.92.245.37 - - [12/Jul/2023:08:00:07 +0800] - "GET 0ms"

The T from get is a ], so this seems to be edited, please share the original log error without editing it as this can lead to confusion.

But, this is not the issue, the issue is that this message really does not match the grok, it is not the same message you tested in grok de bugger, this one is missing a lot of things, it does not have the endpoint, does not have the http version, does not have the response code.

You need to add another pattern to deal with those kind of messages.

If you test this message in the grok debugger and you will see that it will fail.

10.92.245.37 - - [12/Jul/2023:08:00:07 +0800] - "GET 0ms"

Hello @leandrojmp ,
thanks for your response.
I have not edit any message and data what i have shared it.

if you see my snapshot with the same grok pattern i used in kibana all fields are showing correctly for this log.

the point is if kibana is showing all fields parsed with same grok, logstash should show those too ?

here with is another example of the processing logs below .

{
       "message" => "{\"@timestamp\":\"2023-08-04T02:19:00.208Z\",\"@metadata\":{\"beat\":\"filebeat\",\"type\":\"_doc\",\"version\":\"8.5.3\"},\"host\":{\"name\":\"x01stsmgapp7a.vsi.uat.dbs.com\"},\"agent\":{\"version\":\"8.5.3\",\"ephemeral_id\":\"70d232fa-b29d-4dba-a979-a762597d93d6\",\"id\":\"d2a754c4-f257-4701-9169-7d308db3ece5\",\"name\":\"x01stsmgapp7a.vsi.uat.dbs.com\",\"type\":\"filebeat\"},\"log\":{\"offset\":0,\"file\":{\"path\":\"/tmp/csec/httpd/access_log1.log\"}},\"message\":\"10.92.245.37 - - [12/Jul/2023:08:00:07 +0800] - \\\"GET /isalive HTTP/1.1\\\" 200 15bytes \\\"-\\\" \\\"ZK3tBx_wNPwl3QRmANzTWgAAAA8\\\" [-] 0ms\",\"input\":{\"type\":\"filestream\"},\"metadata\":{\"timezone\":\"Asia/Singapore\",\"application\":\"CSEC-ENT-HD-ACC\",\"component_type\":\"csec_httpd_app\"},\"topic\":\"testtopic2006\",\"ecs\":{\"version\":\"8.0.0\"}}",
    "@timestamp" => 2023-08-04T02:19:06.291448796Z,
          "tags" => [
        [0] "_grokparsefailure"
    ],
      "@version" => "1",
         "event" => {
        "original" => "{\"@timestamp\":\"2023-08-04T02:19:00.208Z\",\"@metadata\":{\"beat\":\"filebeat\",\"type\":\"_doc\",\"version\":\"8.5.3\"},\"host\":{\"name\":\"x01stsmgapp7a.vsi.uat.dbs.com\"},\"agent\":{\"version\":\"8.5.3\",\"ephemeral_id\":\"70d232fa-b29d-4dba-a979-a762597d93d6\",\"id\":\"d2a754c4-f257-4701-9169-7d308db3ece5\",\"name\":\"x01stsmgapp7a.vsi.uat.dbs.com\",\"type\":\"filebeat\"},\"log\":{\"offset\":0,\"file\":{\"path\":\"/tmp/csec/httpd/access_log1.log\"}},\"message\":\"10.92.245.37 - - [12/Jul/2023:08:00:07 +0800] - \\\"GET /isalive HTTP/1.1\\\" 200 15bytes \\\"-\\\" \\\"ZK3tBx_wNPwl3QRmANzTWgAAAA8\\\" [-] 0ms\",\"input\":{\"type\":\"filestream\"},\"metadata\":{\"timezone\":\"Asia/Singapore\",\"application\":\"CSEC-ENT-HD-ACC\",\"component_type\":\"csec_httpd_app\"},\"topic\":\"testtopic2006\",\"ecs\":{\"version\":\"8.0.0\"}}"
    }

below is the logstash filter too

filter {
            grok {
                   match => { "message" => '%{IPORHOST:[source][address]} (?:-|%{HTTPDUSER:[access][user][identity]}) (?:-|%{HTTPDUSER:[user][name]}) \[%{HTTPDATE:timestamp}\] (%{HTTPDUSER:[user][name]})? \"(?:%{WORD:[http][request][method]} %{NOTSPACE:[url][original]}(?: HTTP/%{NUMBER:[http][version]})?|%{DATA})\" (?:-|%{INT:[http][response][status_code]:int}) (?:-|%{INT:[http][response][body][bytes]:int}bytes) \"(?:-|%{IPORHOST:[destination][address]})\" \"%{DATA:session}\" \[%{DATA:extra}\] (%{INT:time_taken:int})ms' }
                  }
       }
output
        {
        stdout { codec => rubydebug }
}

Can you share your entire logstash pipeline?

Just noticed it now, but your message field and also the event.original field have a json document from beats.

You need to parse the json beats message first, do you have a json filter in your pipeline before your grok filter?

i am using one config only for logstash . i have custom logs but i am testing with this log one events using this grok , this is .log format file which i am using there is no json data in this.

this is my complete logstash pipelines configuration

[utsxxxx@xxxxxxxxx7a bin]$ cat ../pipelines/2csec.conf
input {
   kafka {
      group_id => "csec_httpd"
      client_id => "csec_httpd"
      topics => [ "testtopic2006" ]
      bootstrap_servers => "xxxxx4a.vsi.uat.abc.com:9093,xxxxx5a.vsi.uat.abc.com:9093,xxxxxx6a.vsi.uat.abc.com:9093"
      security_protocol => "SSL"
      ssl_key_password => "${KAFKA_SSL_PWD}"
      ssl_keystore_location => "/xxxx/xxx/xxxkafkaClient/server.keystore.jks"
      ssl_keystore_password => "${KAFKA_SSL_PWD}"
      ssl_truststore_location => "/xxxx/xxx/xxxkafkaClient/server.truststore.jks"
      ssl_truststore_password => "${KAFKA_SSL_PWD}"
   }
     }
filter {
            grok {
                   match => { "message" => "%{IPORHOST:[source][address]} (?:-|%{HTTPDUSER:[access][user][identity]}) (?:-|%{HTTPDUSER:[user][name]}) \[%{HTTPDATE:timestamp}\] (%{HTTPDUSER:[user][name]})? \"(?:%{WORD:[http][request][method]} %{NOTSPACE:[url][original]}(?: HTTP/%{NUMBER:[http][version]})?|%{DATA})\" (?:-|%{INT:[http][response][status_code]:int}) (?:-|%{INT:[http][response][body][bytes]:int}bytes) \"(?:-|%{IPORHOST:[destination][address]})\" \"%{DATA:session}\" \[%{DATA:extra}\] (%{INT:time_taken:int})ms" }
                }
       }
output
        {
        stdout { codec => rubydebug }
        ##### dump to the screen part here
        elasticsearch {
            cacert => "/xxxx/xxx/xxxkafkaClient/elasticsearch.uat.abc.corp.cer"
            ssl => true
            ssl_certificate_verification => true
            user => "logstash_output"
            password => "${LOGSTASH_OUTPUT_PWD}"
            id => "csec_id"
            action=> "index"
            hosts => [ "https://xxxxxx1a.vsi.uat.abc.com:9210", "https://xxxxxx2a.vsi.uat.abc.com:9210", "https://xxxxx3a.vsi.uat.abc.com:9210" ]
            index => "csec-%{+YYYY.MM}"
            #index => "csec-2023-test"
            resurrect_delay => 10
            timeout => 300
        }
    }

there is another grok pattern i have tried but logstash is failing for same

%{IPORHOST:client_ip} - - \[%{HTTPDATE:timestamp}\] - "%{WORD:request_method} %{URIPATHPARAM:request_path} HTTP/%{NUMBER:http_version}" %{NUMBER:response_code} %{NUMBER:response_size}bytes "%{DATA:referrer}" "%{DATA:user_agent}" \[-\] %{NUMBER:response_time}ms

this is my filebeat config as well.

[utsxxx@xxxx7a filebeat-8.5.3-linux-x86_64]$ cat filebeat.yml
###################### Filebeat Configuration Example #########################

# This file is an example configuration file highlighting only the most common
# options. The filebeat.reference.yml file from the same directory contains all the
# supported options with more comments. You can use it as a reference.
#
# You can find the full configuration reference here:
# https://www.elastic.co/guide/en/beats/filebeat/index.html

# For more available modules and options, please see the filebeat.reference.yml sample
# configuration file.

# ============================== Filebeat inputs ===============================

filebeat.inputs:

#===============CSESC TEST FIle processing ===================##

  ## AM logs processing
- fields_under_root: true
  paths:
  - /tmp/csec/httpd/access_log1.log
  tail_files: true
  close_inactive: 20h
  type: filestream
  fields:
    metadata:
      timezone: Asia/Singapore
      application: CSEC-ENT-HD-ACC
      component_type: csec_httpd_app
    topic: testtopic2006
  enabled: true

# ============================== Filebeat modules ==============================

filebeat.config.modules:
  path: ${path.config}/modules.d/*.yml
  reload.enabled: false
  # Period on which files under path should be checked for changes
  #reload.period: 10s

# ======================= Elasticsearch template setting =======================

setup.template.settings:
  index.number_of_shards: 3

output.kafka:
  # Array of hosts to connect to.
  hosts: ["xxxxx4a.vsi.uat.abc.com:9093", "xxxxxx5a.vsi.uat.abc.com:9093", "xxxx6a.vsi.uat.abc.com:9093"]
  ssl.certificate_authorities: ["/truststore.pem"]
  ssl.certificate: "/keystore.pem"
  ssl.key: "/kf.client.key"
  topic: '%{[topic]}'
  partition.round_robin:
    reachable_only: true
  metadata:
    retry.max: 5
    retry.backoff: 1000ms
    refresh_frequency: 10m
  worker: 1
  max_retries: -1
  timeout: 30s
  broker_timeout: 10s
  compression: none
  required_acks: 1

logging.level: info
logging.to_files: true
logging.files:
  name: filebeat
  path: /xxxx/filebeat-8.5.3-linux-x86_64/logs
  #  rotateeverybytes: 10485760 # = 10MB
  keepfiles: 7
  permissions: 0644
[utsxxxx@xxxxxx7a filebeat-8.5.3-linux-x86_64]$

Your original log may not be a json, but since you are using filebeat it will send a json to kafka and you need to parse this json in logstash before anything else.

For example, if you have a something.log file with the following line:

2023-08-04 00:50:00 something somethingElse another thing

Filebeat will read this line and store it in a field named message inside a json that will then be sent to your kafka and logstash will consume it.

In your kafka you will have something like this:

{ "some-key-value-pairs", "message": "2023-08-04 00:50:00 something somethingElse another thing" }

This is exactly what you have, just look how the message field in logstash is a json.

When Logstash consumes from kafka, the event in kafka will be stored in a field also named message, so in logstash you will have something like this:

message => { "some-key-value-pairs", "message": "2023-08-04 00:50:00 something somethingElse another thing" }

You need to parse this message field, that comes from filebeat, into logstash, this will extract the json fields and your original message.

Add this before your grok filter.

json {
    source => "message"
}

thank you @leandrojmp , this is really making sense to me now.

it seems grok parsing the fields but seems like indexing is getting failed. my kibana is getting reported with below errors

1 of 4 shards failed

{
  "took": 6,
  "timed_out": false,
  "_shards": {
    "total": 4,
    "successful": 3,
    "skipped": 3,
    "failed": 1,
    "failures": [
      {
        "shard": 0,
        "index": "csec-2023.08",
        "node": "vl9X_SCVQ8uZaD1Y2l1skQ",
        "reason": {
          "type": "illegal_argument_exception",
          "reason": "error fetching [timestamp]: Field [timestamp] doesn't support formats.",
          "caused_by": {
            "type": "illegal_argument_exception",
            "reason": "Field [timestamp] doesn't support formats."
          }
        }
      }
    ]
  },
  "hits": {
    "max_score": null,
    "hits": []
  }
}```

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.