Grok pattern query for access logs

I am using the below query to extract IP address, timestamp and such from an access log like this below:

192.168.10.182 - - - 18/Nov/2019:13:42:14 -0800 "GET /xxxx-xxxx-xxxx/xxxxxxxxxx/xxx/xxxx/xxxxxxxxxx HTTP/1.1" 200 4534 
GET xxxx-xxxx-xxxx/xxxxxxxxxx/xxx/xxxx/xxxxxxxxxx HTTP/1.1
PUT _ingest/pipeline/access_log
{
  "description" : "Ingest pipeline for Combined Log Format",
  "processors" : [
    {
      "grok": {
        "field": "message",
        "patterns": ["%{IPORHOST:clientip} %{USER:ident} %{USER:auth} \\[%{HTTPDATE:timestamp}\\] \"%{WORD:verb} %{DATA:request} HTTP/%{NUMBER:httpversion}\" %{NUMBER:response:int} (?:-|%{NUMBER:bytes:int})"]
      }
    },
    {
      "date": {
        "field": "timestamp",
        "formats": [ "dd/MMM/YYYY:HH:mm:ss Z" ]
      }
    }
]
}

Why is the grok pattern giving error at this?

This seems to be fine. What is the error that you are getting?

This is the error

{
  "docs" : [
    {
      "error" : {
        "root_cause" : [
          {
            "type" : "exception",
            "reason" : """java.lang.IllegalArgumentException: java.lang.IllegalArgumentException: Provided Grok expressions do not match field value: [192.168.10.182 - - - 18/Nov/2019:13:42:14 -0800 "GET /jdbc-data-server/jdbcdataserver/data/discover/TransactionsDemo HTTP/1.1" 200 4534]""",
            "header" : {
              "processor_type" : "grok"
            }
          }
        ],
        "type" : "exception",
        "reason" : """java.lang.IllegalArgumentException: java.lang.IllegalArgumentException: Provided Grok expressions do not match field value: [192.168.10.182 - - - 18/Nov/2019:13:42:14 -0800 "GET /jdbc-data-server/jdbcdataserver/data/discover/TransactionsDemo HTTP/1.1" 200 4534]""",
        "caused_by" : {
          "type" : "illegal_argument_exception",
          "reason" : """java.lang.IllegalArgumentException: Provided Grok expressions do not match field value: [192.168.10.182 - - - 18/Nov/2019:13:42:14 -0800 "GET /jdbc-data-server/jdbcdataserver/data/discover/TransactionsDemo HTTP/1.1" 200 4534]""",
          "caused_by" : {
            "type" : "illegal_argument_exception",
            "reason" : """Provided Grok expressions do not match field value: [192.168.10.182 - - - 18/Nov/2019:13:42:14 -0800 "GET /jdbc-data-server/jdbcdataserver/data/discover/TransactionsDemo HTTP/1.1" 200 4534]"""
          }
        },
        "header" : {
          "processor_type" : "grok"
        }
      }
    }
  ]
}

You need to re look your grok pattern. It in not matching the fields value as clearly mentioned in the error. The log pattern has an extra -. So there should be handling for that. The correct grok pattern would be :

%{IPORHOST:clientip} %{USER:ident} %{USER:auth} %{USER:extra} %{HTTPDATE:timestamp} \"%{WORD:verb} %{DATA:request} HTTP/%{NUMBER:httpversion}\" %{NUMBER:response:int} (?:-|%{NUMBER:bytes:int})

Hi @nishant.saini, Thanks for the feedback. It helped!

Until the %{HTTPDATE:timestamp} the query runs without exception but below still gives error for the text after GET

\"%{WORD:verb} %{DATA:request} HTTP/%{NUMBER:httpversion}\" %{NUMBER:response:int} (?:-|%{NUMBER:bytes:int}

What is the issue here? And is there a place for further reference and doc?

Are you still getting the same error? Because I don't find anything wrong in the pattern that you mentioned, other that the missing ) at the end in below:

\"%{WORD:verb} %{DATA:request} HTTP/%{NUMBER:httpversion}\" %{NUMBER:response:int} (?:-|%{NUMBER:bytes:int}

Regular expressions for grok patterns can be found here. You can use kibana dev tools to check the grok pattern against the string (log line).

Using the grok debugger in Kibana dev tools, I parsed this as you provided but it gives an error that " Provided grok pattern grok patterns do not match data in the input"

%{IPORHOST:clientip} %{USER:ident} %{USER:auth} %{USER:extra} %{HTTPDATE:timestamp} \"%{WORD:verb} %{DATA:request} HTTP/%{NUMBER:httpversion}\" %{NUMBER:response:int} (?:-|%{NUMBER:bytes:int})

For this data-

192.168.10.182 - - - 18/Nov/2019:13:42:14 -0800 "GET /xxxxxxxxxxxxxxxx/xxxxxxxxxxx/data/discover/xxxxxxxxxxxxxxxx HTTP/1.1"

So there is a grok pattern issue?

There is difference between the log pattern and that is why the issue of not matching. Below are the two type of logs you are dealing with:

192.168.10.182 - - - 18/Nov/2019:13:42:14 -0800 "GET /jdbc-data-server/jdbcdataserver/data/discover/TransactionsDemo HTTP/1.1" 200 4534
192.168.10.182 - - - 18/Nov/2019:13:42:14 -0800 "GET /xxxxxxxxxxxxxxxx/xxxxxxxxxxx/data/discover/xxxxxxxxxxxxxxxx HTTP/1.1"

Notice that in the second log response and bytes are missing and that is the reason you are getting the error:

" Provided grok pattern grok patterns do not match data in the input"

Looking at the grok pattern, %{NUMBER:response:int} implies that response part is always expected to be present in the log line where as (?:-|%{NUMBER:bytes:int}) implies that it expects numeric value or - for bytes

To solve this you have to make sure that each log line has a fixed pattern or you can make the grok expressions optional as well.

It worked, thankyou!! but this is in dev tools. What if I want this parsed way of information on kibana discover tab?

So where is this PUT query for pattern stored in? Which file in particular?

Go through the ingest node documentation here and the subsequent topic to understand it's working.