Error : "tags" => [ [0] "_grokparsefailure" ]..Grok filter error

naveenrt23 · August 15, 2019, 7:18pm

Hello,

I am using grok filters for parsing messages and I can see parsing failures but can't figure out what's causing the issue.

Grok Filter:

filter {
      # Create a copy of original message
      mutate {
        add_field => {
          "[@metadata][copyOfMessage]" => "%{[message]}"
        }
      }
      # split message
      mutate {
        split => {
          "[@metadata][copyOfMessage]" => "|"
        }
      }
      if [@metadata][copyOfMessage][6] =~ /^\/api\// {
        grok {
          break_on_match => false
          match => { "message" => "%{DATA:timestamp_local}\|%{NUMBER:duration}\|%{WORD:requesttype}\|%{IP:clientip}\|%{DATA:username}\|%{WORD:method}\|%{DATA:resource}\|%{DATA:protocol}\|%{NUMBER:statuscode}\|%{NUMBER:bytes}" }
        }
        grok {
          break_on_match => false
          match => { "resource" => "/%{DATA:dropFirstValue}/%{DATA:dropSecondValue}/%{DATA:repo}/%{GREEDYDATA:resource_path}" }
        }

# Drop the first two values when the resource_path starts with "/api/"
        mutate {
          remove_field => ["dropFirstValue", "dropSecondValue"]
        }
      }
      else{
        grok {
            # Enable multiple matchers
          break_on_match => false

          match => { "message" => "%{DATA:timestamp_local}\|%{NUMBER:duration}\|%{WORD:requesttype}\|%{IP:clientip}\|%{DATA:username}\|%{WORD:method}\|%{DATA:resource}\|%{DATA:protocol}\|%{NUMBER:statuscode}\|%{NUMBER:bytes}" }

            # Extract repo and path
          match => { "resource" => "/%{DATA:repo}/%{GREEDYDATA:resource_path}"}

        }
      }
# Extract resource name
      grok {
        break_on_match => false
        match => { "resource_path" => "(?<resource_name>[^/]+$)" }
      }
# Extract file extension
      grok {
        break_on_match => false
        match => { "resource_path" => "(?<resource_type>[^.]+$)" }
      }
# Parse date field
      date {
        timezone => "UTC"
        match => [ "timestamp_local" , "yyyyMMddHHmmss" ]
        target => "timestamp_object"
      }
      mutate {
        add_field => { "time" => "%{time}"}
      }
      ruby {
        code => "event.set('timestamp', event.get('timestamp_object').to_i * 1000);event.set('time',event.get('timestamp_object').to_i*1000000000 + rand(100000000))"
      }
    }

Output:

{
            "duration" => "9599",
           "timestamp" => 1565891419000,
                 "env" => "test",
                "time" => 1565891419078802209,
          "@timestamp" => 2019-08-15T19:08:04.339Z,
            "@version" => "1",
                "path" => "/testing/test.log",
            "username" => "anonymous",
                "host" => "myself-cg45",
            "resource" => "/api/test/Lighter-group",
     "timestamp_local" => "20190815175019",
              "method" => "POST",
          "statuscode" => "200",
             "message" => "20190815175019|9599|REQUEST|14.56.55.120|anonymous|POST|/api/test/Lighter-test-group|HTTP/1.1|200|452",
            "protocol" => "HTTP/1.1",
               "bytes" => "452",
                "site" => "XYZ",
                "tags" => [
        [0] "_grokparsefailure"
    ],
            "clientip" => "14.56.55.120",
         "requesttype" => "REQUEST",
    "timestamp_object" => 2019-08-15T17:50:19.000Z
}

Badger · August 15, 2019, 8:05pm

I would not expect a split of a field in [@metadata] to work due to this issue.

Instead of /%{DATA:dropFirstValue}/%{DATA:dropSecondValue}/ and then removing the fields you can just use /%{DATA}/%{DATA}/. Personally I would use /[^/]+/[^/]+

naveenrt23 · August 18, 2019, 9:15pm

Badger · August 18, 2019, 9:33pm

Actually the split works just fine. Your grok in the case where [resource] starts with /api assumes 4 parts to the path, but in that message it only has three. Perhaps

grok { match => { "resource" => "(/[^\/]+)?/[^\/]+/%{DATA:repo}/%{GREEDYDATA:resource_path}" } }

would work?

naveenrt23 · August 18, 2019, 10:12pm

Actually the message format is variable and the resource part can have anywhere between 0-10 parts.

my understanding of the filter was that it drops the first two words no matter how many parts and if the message has 3 parts, the 3rd word would be parsed as repo and the 4th part would be an empty string

naveenrt23 · August 19, 2019, 2:11pm

Is it possible to put a string length check here..like if the resource_path is 0 then display as empty string rather than a grok parse failure ?..or even give a default value to the field or remove that field?

naveenrt23 · August 20, 2019, 12:52am

Hello,

I tried different Regex but none seem to solve the issue..is there anyway in the grok filter to fix this issue ?

Badger · August 20, 2019, 12:08pm

You haven't really explained what the issue is. You gave an example, for which I gave you a possible solution, then said there are other possible inputs. That is not enough information to offer a solution to.

naveenrt23 · August 20, 2019, 12:31pm

I apologize for not providing the info required. I've used the below examples for testing

20190815175019|9599|REQUEST|14.56.55.120|anonymous|POST|/api/test/Lighter-test-group|HTTP/1.1|200|452
20190815175019|9599|REQUEST|14.56.55.120|anonymous|POST|/list/Lighter-test-group|HTTP/1.1|200|452
20190815175019|9599|REQUEST|14.56.55.120|anonymous|POST|/api/test/Lighter-test-group/2.0|HTTP/1.1|200|452
20190815175019|9599|REQUEST|14.56.55.120|anonymous|POST|/list/Lighter-test-group/xyz/123|HTTP/1.1|200|452
20190815175019|9599|REQUEST|14.56.55.120|anonymous|POST|/api/test/Lighter-test-group/2.0/1.2|HTTP/1.1|200|452

and the grok filter is

filter {
      # Create a copy of original message
      mutate {
        add_field => {
          "[@metadata][copyOfMessage]" => "%{[message]}"
        }
      }
      # split message
      mutate {
        split => {
          "[@metadata][copyOfMessage]" => "|"
        }
      }
      if [@metadata][copyOfMessage][6] =~ /^\/api\// {
        grok {
          break_on_match => false
          match => { "message" => "%{DATA:timestamp_local}\|%{NUMBER:duration}\|%{WORD:requesttype}\|%{IP:clientip}\|%{DATA:username}\|%{WORD:method}\|%{DATA:resource}\|%{DATA:protocol}\|%{NUMBER:statuscode}\|%{NUMBER:bytes}" }
        }
        grok {
          break_on_match => false
          match => { "resource" => "(/[^\/]+)?/[^\/]+/%{DATA:repo}/%{GREEDYDATA:resource_path}" }
        }

      }
      elseif [@metadata][copyOfMessage][6] =~ /^\/list\// or [@metadata][copyOfMessage][6] =~ /^\/simple\// {
        grok {
          break_on_match => false
          match => { "message" => "%{DATA:timestamp_local}\|%{NUMBER:duration}\|%{WORD:requesttype}\|%{IP:clientip}\|%{DATA:username}\|%{WORD:method}\|%{DATA:resource}\|%{DATA:protocol}\|%{NUMBER:statuscode}\|%{NUMBER:bytes}" }
        }
        grok {
          break_on_match => false
          match => { "resource" => "/%{DATA:dropFirstValue}/%{DATA:repo}/%{GREEDYDATA:resource_path}" }
        }

# Drop the first two values when the resource_path starts with "/list/" and "/simple/"
        mutate {
           remove_field => ["dropFirstValue"]
        }
      }
      else{
        grok {
            # Enable multiple matchers
          break_on_match => false

          match => { "message" => "%{DATA:timestamp_local}\|%{NUMBER:duration}\|%{WORD:requesttype}\|%{IP:clientip}\|%{DATA:username}\|%{WORD:method}\|%{DATA:resource}\|%{DATA:protocol}\|%{NUMBER:statuscode}\|%{NUMBER:bytes}" }

            # Extract repo and path
          match => { "resource" => "/%{DATA:repo}/%{GREEDYDATA:resource_path}"}

        }
      }
# Extract resource name
      grok {
        break_on_match => false
        match => { "resource_path" => "(?<resource_name>[^/]+$)" }
      }
# Extract file extension
      grok {
        break_on_match => false
        match => { "resource_path" => "(?<resource_type>[^.]+$)" }
      }
# Parse date field
      date {
        timezone => "UTC"
        match => [ "timestamp_local" , "yyyyMMddHHmmss" ]
        target => "timestamp_object"
      }
      mutate {
        add_field => { "time" => "%{time}"}
      }
      ruby {
        code => "event.set('timestamp', event.get('timestamp_object').to_i * 1000);event.set('time',event.get('timestamp_object').to_i*1000000000 + rand(100000000))"
      }
    }

From what I see the value of repo is getting parsed differently for different messages.

For messages starting with /api/,the 3rd word has to be a repo but

If i dont use the regex provided, if the resource has three parts, we have a grok parse failure but anything more than 3 parts is working fine

naveenrt23 · August 20, 2019, 3:10pm

I've tried using

      match => { "resource" => "^(\/)+[^\/]+(\/)[^\/]+/%{DATA:repo}/%{GREEDYDATA:resource_path}" }

which would match /{firstword}/{secondword} but the same issue persists.

if the resource has three parts, we have a grok parse failure but anything more than 3 parts is working fine

naveenrt23 · August 20, 2019, 3:44pm

I've looked at GreedyData and Data grok syntax which are

DATA .*?
GREEDYDATA .*

If they accept an empty string, I don't see why we are having a grok parse failure..is there any way to debug further?

naveenrt23 · August 20, 2019, 4:36pm

so there is no grok parse failure if the message has trailing "/"
i.e,
/api/test/Lighter-test-group to /api/test/Lighter-test-group/
and
/list/Lighter-test-group to /list/Lighter-test-group/

from which I understand that empty string or space works fine with DATA/GREEDYDATA, but do we have a workaround solution on how to tackle this?

Badger · August 20, 2019, 8:38pm

The only one of your examples that gets a _grokparsefailure is

        "resource" => "/list/Lighter-test-group",

To fix that change the second grok in the branch that handles /list/ to be

match => { "resource" => "(/%{DATA})?/%{DATA:repo}/%{GREEDYDATA:resource_path}" }

naveenrt23 · August 21, 2019, 3:10am

Grok filter:

  if [@metadata][copyOfMessage][6] =~ /^\/api\// {
    grok {
      break_on_match => false
      match => { "message" => "%{DATA:timestamp_local}\|%{NUMBER:duration}\|%{WORD:requesttype}\|%{IP:clientip}\|%{DATA:username}\|%{WORD:method}\|%{DATA:resource}\|%{DATA:protocol}\|%{NUMBER:statuscode}\|%{NUMBER:bytes}" }
    }
    grok {
      break_on_match => false
      match => { "resource" => "(/[^\/]+)?/[^\/]+/%{DATA:repo}/%{GREEDYDATA:resource_path}" }
    }

  }
  elseif [@metadata][copyOfMessage][6] =~ /^\/list\// or [@metadata][copyOfMessage][6] =~ /^\/simple\// {
    grok {
      break_on_match => false
      match => { "message" => "%{DATA:timestamp_local}\|%{NUMBER:duration}\|%{WORD:requesttype}\|%{IP:clientip}\|%{DATA:username}\|%{WORD:method}\|%{DATA:resource}\|%{DATA:protocol}\|%{NUMBER:statuscode}\|%{NUMBER:bytes}" }
    }
    grok {
      break_on_match => false
      match => { "resource" => "(/%{DATA})?/%{DATA:repo}/%{GREEDYDATA:resource_path}" }
    }
  }

Test messages:

20190815175019|9599|REQUEST|14.56.55.120|anonymous|POST|/api/test/Lighter-test-group|HTTP/1.1|200|452
20190815175019|9599|REQUEST|14.56.55.120|anonymous|POST|/list/Lighter-test-group|HTTP/1.1|200|452
20190815175019|9599|REQUEST|14.56.55.120|anonymous|POST|/api/test/Lighter-test-group/2.0|HTTP/1.1|200|452
20190815175019|9599|REQUEST|14.56.55.120|anonymous|POST|/list/Lighter-test-group/xyz/123|HTTP/1.1|200|452
20190815175019|9599|REQUEST|14.56.55.120|anonymous|POST|/api/test/Lighter-test-group/2.0/1.2|HTTP/1.1|200|452

the output:

For /api/test/Lighter-test-group/2.0 , repo= Lighter-test-group
For /api/test/Lighter-test-group , repo = test
For /list/Lighter-test-group , repo = list
For /api/test/Lighter-test-group/2.0/1.2 , repo = Lighter-test-group
For /list/Lighter-test-group/xyz/123 , repo = Lighter-test-group

There are no grokparsefailures, but messages are getting parsed differently. For messages starting with list or api..repo value is getting parsed differently. All the messages need to be parsed as repo=Lighter-test-group

Badger · August 21, 2019, 12:32pm

In that case make the resource_path optional rather than the first element

 match => { "resource" => "/%{DATA}/%{DATA:repo}(/%{GREEDYDATA:resource_path})?" }

naveenrt23 · August 21, 2019, 12:35pm

I tried this but I'm getting a grok parse failure

Badger · August 21, 2019, 12:51pm

match => { "resource" => "/[^/]+/(?<repo>[^/]+)(/%{GREEDYDATA:resource_path})?" }

That extracts repo correctly. The later groks give a _grokparsefailure because resource_path does not exist in one case.

naveenrt23 · August 21, 2019, 1:34pm

Grok filters:

For list:

      match => { "resource" => "/[^/]+/(?<repo>[^/]+)(/%{GREEDYDATA:resource_path})?" }

For api:

      match => { "resource" => "(/[^\/]+)?/[^/]+/(?<repo>[^/]+)(/%{GREEDYDATA:resource_path})?" }

I've used these filters with the same input requests

but for /list/Lighter-test-group and /api/test/Lighter-test-group

the resource and repo's are getting parsed but i still see a grok parse failure

sample output:

{
            "username" => "anonymous",
          "@timestamp" => 2019-08-21T13:29:26.860Z,
              "method" => "POST",
            "resource" => "/api/test/Lighter-test-group",
           "timestamp" => 1565891419000,
          "statuscode" => "200",
                "host" => "user",
            "protocol" => "HTTP/1.1",
    "timestamp_object" => 2019-08-15T17:50:19.000Z,
                "path" => "/Users/hack/test-arti.log",
                "repo" => "Lighter-test-group",
             "message" => "20190815175019|9599|REQUEST|14.56.55.120|anonymous|POST|/api/test/Lighter-test-group|HTTP/1.1|200|452",
     "timestamp_local" => "20190815175019",
                "time" => 1565891419093647427,
         "requesttype" => "REQUEST",
            "clientip" => "14.56.55.120",
               "bytes" => "452",
            "@version" => "1",
            "duration" => "9599",
                "tags" => [
        [0] "_grokparsefailure"
    ]
}

Badger · August 21, 2019, 1:54pm

and I explained why in my previous post

naveenrt23 · August 21, 2019, 1:56pm

Since we made the resource_path optional, is it expected to throw an error if doesnot exist ? Is there any way we can parse the repo's and remove the error message ?

Topic		Replies	Views
Another mysterious work logstash with errors _grokparsefailure Logstash	19	572	August 9, 2023
Grokparsefailure message in the case of using grok filter Logstash	2	299	September 26, 2019
Grokparse failure even grok debugger fine Logstash	9	449	September 1, 2023
Grok parse failures .. Grok syntax Logstash	22	2463	November 9, 2018
Logstash _grokparsefailure . Unable to find issue Logstash	16	11825	July 6, 2017

Error : "tags" => [ [0] "_grokparsefailure" ]..Grok filter error

Related topics