Cant get grok working with nginx log

Hi,
Can someone help me with nginx logs please.

I have full nginx log:

2021-12-28T17:00:34+00:00 site="demo" server="demotest" dest_port="443" dest_ip="172.19.0.3" src="172.31.27.65" src_ip="172.31.27.65" user="-" time_local="28/Dec/2021:17:00:34 +0000" protocol="HTTP/1.1" status="200" bytes_out="1076" bytes_in="914" http_referer="-" http_user_agent="ELB-HealthChecker/2.0" nginx_version="1.21.4" http_x_forwarded_for="-" http_x_header="-" uri_query="-" uri_path="/" http_method="GET" response_time="0.000" cookie="-" request_time="0.000" category="text/html" https="on"

So i have updated filebeat/module/nginx/access/ingest/default.json to this:
"patterns":[
"%{TIMESTAMP_ISO8601:nginx.access.time} site="%{DATA:nginx.access.server_name}" dest_port="%{NUMBER:nginx.access.server_port}" dest_ip="%{IP_LIST:nginx.access.server_addr}" src="%{IP_LIST:nginx.access.remote_ip_host}" src_ip="%{IP_LIST:nginx.access.remote_ip_list}" user="%{DATA:nginx.access.user_name}" time_local="%{GREEDYDATA:nginx.access.time_local}" protocol="%{DATA:nginx.access.http_version}" status="%{NUMBER:nginx.access.response_code}" bytes_out="%{NUMBER:nginx.access.body_sent.bytes}" bytes_in="%{NUMBER:nginx.access.body_received.bytes}" http_referer="%{GREEDYDATA:nginx.access.referrer}" http_user_agent="%{DATA:nginx.access.agent}" nginx_version="%{NUMBER:nginx.access.nginx_version}" http_x_forwarder_for="%{DATA:nginx.access.http_x_forwarder_for}" http_x_header="%{DATA:nginx.access.http_x_header}" uri_query="%{DATA:nginx.access.query_string}" uri_path="%{DATA:nginx.access.url}" http_method="%{DATA:nginx.access.method}" response_time="%{NUMBER:nginx.access.response.seconds}" cookie="%{GREEDYDATA:nginx.access.cookie}" request_time="%{NUMBER:nginx.access.request.seconds}" category="%{DATA:nginx.access.category}" https="%{DATA:nginx.accesshttps}""
]

What i am doing wrong?

Thank you so much

Hi @Constantine Welcome to the community.

Couple things please,

Please format your code in samples using the format button above </> that helps tremendously bascially we can't read your issue.

2nd what version of the stack are you on?

3rd... Yes Grok can take a little work...

You have a number of issues at at glance.

I would highly recommend referring to patterns here

I highly recommend using the The incremental grok constructor here it does what I will recommend in a below

And of course the Grok Debugger built into Kibana - Dev Tools

You have a number of errors..you are using DATA and GREEDYDATA in places you should not be you are missing the server="demotest" looks like you are using IP_LIST instead of the defined NGINX_ADDRESS_LIST among the first things I see.

So lets get started...

Pro Tip 1 Build your Grok parser incrementally 1 or a couple fields each iteration then end with %{GREEDYDATA:rest_of_message} then keep building until you are at the end. This is basically what the incremental constructor I mention above does, but you can do the same with the built in Grok debugger.

Pro Tip 2 Use the most specific pattern you can on each instead of DATA perhaps use HOSTNAME etc.

Pro Tip 3 Keep you work in a Code Editor as you work as these tools have a way of resetting if you accidently navigate away.

Pro tip 4 Get this all working BEFORE you try to put it into the Filebeat config...

So I got you started... See the Details and Image below

I set your message

2021-12-28T17:00:34+00:00 site="demo" server="demotest" dest_port="443" dest_ip="172.19.0.3" src="172.31.27.65" src_ip="172.31.27.65" user="-" time_local="28/Dec/2021:17:00:34 +0000" protocol="HTTP/1.1" status="200" bytes_out="1076" bytes_in="914" http_referer="-" http_user_agent="ELB-HealthChecker/2.0" nginx_version="1.21.4" http_x_forwarded_for="-" http_x_header="-" uri_query="-" uri_path="/" http_method="GET" response_time="0.000" cookie="-" request_time="0.000" category="text/html" https="on"

I added the custom GROK Pattern Definition (NOTE THE DIFFERENCE in the GROK Debugger and the syntax in the filebeat ingest.yml : vs no : in the GROK debugger

NGINX_ADDRESS_LIST (?:%{IP}|%{WORD})("?,?\s*(?:%{IP}|%{WORD}))*

You will probably need all 3

  NGINX_HOST (?:%{IP:destination.ip}|%{NGINX_NOTSEPARATOR:destination.domain})(:%{NUMBER:destination.port})?
  NGINX_NOTSEPARATOR "[^\t ,:]+"
  NGINX_ADDRESS_LIST (?:%{IP}|%{WORD})("?,?\s*(?:%{IP}|%{WORD}))*

Then I am working my way through....

Here is a partial solution

%{TIMESTAMP_ISO8601:nginx.access.time} site="%{HOSTNAME:nginx.access.site_name}" server="%{HOSTNAME:nginx.access.server_name}" dest_port="%{NUMBER:nginx.access.server_port}" dest_ip="%{NGINX_ADDRESS_LIST:nginx.access.server_addr}" src="%{NGINX_ADDRESS_LIST:nginx.access.remote_ip_host}" src_ip="%{NGINX_ADDRESS_LIST:nginx.access.remote_ip_list}" user="%{DATA:nginx.access.user_name}" %{GREEDYDATA:rest_of_message}

And the Current Parsed Results...

{
  "nginx": {
    "access": {
      "site_name": "demo",
      "server_name": "demotest",
      "user_name": "-",
      "remote_ip_host": "172.31.27.65",
      "server_port": "443",
      "time": "2021-12-28T17:00:34+00:00",
      "server_addr": "172.19.0.3",
      "remote_ip_list": "172.31.27.65"
    }
  },
  "rest_of_message": "time_local=\"28/Dec/2021:17:00:34 +0000\" protocol=\"HTTP/1.1\" status=\"200\" bytes_out=\"1076\" bytes_in=\"914\" http_referer=\"-\" http_user_agent=\"ELB-HealthChecker/2.0\" nginx_version=\"1.21.4\" http_x_forwarded_for=\"-\" http_x_header=\"-\" uri_query=\"-\" uri_path=\"/\" http_method=\"GET\" response_time=\"0.000\" cookie=\"-\" request_time=\"0.000\" category=\"text/html\" https=\"on\""
}

Now Just Keep Iterating... let us know if you get stuck

2 Likes

A much simpler method would be to use grok to extract the timestamp and the KV processor | Elasticsearch Guide [7.16] | Elastic to parse the remaining fields. The u can use additional processors to clean anything yp.

1 Like

Darn good call @legoguy1000 !!

Didn't see the forest through the trees. Didn't recognize they we're all key / value pairs.

Here is a sample what with the @legoguy1000 is talking about

So now you have 2 options... this is just a sample you could just create your own pipeline ... name it what you like and then set it I think You will still need to add in the rest of the other steps in the pipeline...

DELETE _ingest/pipeline/discuss-ngnix

PUT _ingest/pipeline/discuss-nginx
{
  "processors": [
    {
      "grok": {
        "field": "message",
        "patterns": [
          "%{TIMESTAMP_ISO8601:nginx.access.time} %{GREEDYDATA:message_detail}"
        ]
      }
    },
    {
      "kv": {
        "field": "message_detail",
        "field_split": """ (?=[a-z\_\-]+=)""",
        "value_split": "=",
        "ignore_missing": true,
        "ignore_failure": false,
        "trim_value": "\"",
        "strip_brackets": true,
        "target_field" : "nginx.access"
      }
    },
    {
      "remove": {
        "field": "message_detail"
      }
    }
  ]
}

POST _ingest/pipeline/discuss-nginx/_simulate
{
  "docs": [
    {
      "_source": {
        "message": """2021-12-28T17:00:34+00:00 site="demo" server="demotest" dest_port="443" dest_ip="172.19.0.3" src="172.31.27.65" src_ip="172.31.27.65" user="-" time_local="28/Dec/2021:17:00:34 +0000" protocol="HTTP/1.1" status="200" bytes_out="1076" bytes_in="914" http_referer="-" http_user_agent="ELB-HealthChecker/2.0" nginx_version="1.21.4" http_x_forwarded_for="-" http_x_header="-" uri_query="-" uri_path="/" http_method="GET" response_time="0.000" cookie="-" request_time="0.000" category="text/html" https="on"""
        
      }
    }
  ]
}

Result

{
  "docs" : [
    {
      "doc" : {
        "_index" : "_index",
        "_type" : "_doc",
        "_id" : "_id",
        "_source" : {
          "message" : """2021-12-28T17:00:34+00:00 site="demo" server="demotest" dest_port="443" dest_ip="172.19.0.3" src="172.31.27.65" src_ip="172.31.27.65" user="-" time_local="28/Dec/2021:17:00:34 +0000" protocol="HTTP/1.1" status="200" bytes_out="1076" bytes_in="914" http_referer="-" http_user_agent="ELB-HealthChecker/2.0" nginx_version="1.21.4" http_x_forwarded_for="-" http_x_header="-" uri_query="-" uri_path="/" http_method="GET" response_time="0.000" cookie="-" request_time="0.000" category="text/html" https="on""",
          "nginx" : {
            "access" : {
              "server" : "demotest",
              "bytes_in" : "914",
              "nginx_version" : "1.21.4",
              "http_user_agent" : "ELB-HealthChecker/2.0",
              "src_ip" : "172.31.27.65",
              "protocol" : "HTTP/1.1",
              "uri_path" : "/",
              "http_method" : "GET",
              "request_time" : "0.000",
              "http_x_header" : "-",
              "https" : "on",
              "dest_port" : "443",
              "cookie" : "-",
              "src" : "172.31.27.65",
              "time_local" : "28/Dec/2021:17:00:34 +0000",
              "site" : "demo",
              "uri_query" : "-",
              "bytes_out" : "1076",
              "http_referer" : "-",
              "dest_ip" : "172.19.0.3",
              "http_x_forwarded_for" : "-",
              "response_time" : "0.000",
              "time" : "2021-12-28T17:00:34+00:00",
              "category" : "text/html",
              "user" : "-",
              "status" : "200"
            }
          }
        },
        "_ingest" : {
          "timestamp" : "2021-12-29T03:45:12.29204534Z"
        }
      }
    }
  ]
}

You might need to rename some of the fields to match the original field names...

Thanks everyoe for such an effort to sort this. Using patterns manual i managed to parse everything.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.