Filebeat nginx module

Hello,

I am trying to use filebeat with nginx module to collect logs from nginx-ingress-controller and send directly to elasti but I keep getting an error:

Provided Grok expressions do not match field value: [172.17.0.1 - - [03/Dec/2022:00:05:01 +0000] \"GET /healthz HTTP/1.1\" 200  0 \"-\" \"kube-probe/1.24\" \"-\"]

My filebeat setting:

      filebeat.autodiscover:
        providers:
          - type: kubernetes
            hints.enabled: false
            templates:
              - condition:
                  contains:
                    kubernetes.pod.name: redis
                config:
                  - module: redis
                    log:
                      input:
                        type: container
                        containers.ids:
                          - "${data.kubernetes.container.id}"
                        paths:
                          - /var/log/containers/*${data.kubernetes.container.id}.log

              - condition:
                  contains:
                    kubernetes.pod.name: nginx
                config:
                - module: nginx
                  access:
                    enabled: true
                    input:
                      type: container
                      containers.ids:
                        - "${data.kubernetes.container.id}"
                      paths:
                      - /var/lib/docker/containers/${data.kubernetes.container.id}/*.log


      output.elasticsearch:
        host: '${NODE_NAME}'
        hosts: '["https://${ELASTICSEARCH_HOSTS:elasticsearch-master:9200}"]'
        username: '${ELASTICSEARCH_USERNAME}'
        password: '${ELASTICSEARCH_PASSWORD}'
        protocol: https
        ssl.certificate_authorities: ["/usr/share/filebeat/certs/ca.crt"]
      setup.ilm:
        enabled: true
        overwrite: true
        policy_file: /usr/share/filebeat/ilm.json
      setup.dashboards.enabled: true
      setup.kibana.host: "http://kibana:5601"
    ilm.json: |
      {
        "policy": {
          "phases": {
            "hot": {
              "actions": {
                "rollover": {
                  "max_age": "1d"
                }
              }
            },
            "delete": {
              "min_age": "7d",
              "actions": {
                "delete": {}
              }
            }
          }
        }
      }

I got the ingress from this helm:

And the logs are:

172.17.0.1 - - [02/Dec/2022:23:43:49 +0000] "GET /healthz HTTP/1.1" 200  0 "-" "kube-probe/1.24" "-"
172.17.0.1 - - [02/Dec/2022:23:43:54 +0000] "GET /healthz HTTP/1.1" 200  0 "-" "kube-probe/1.24" "-"
172.17.0.1 - - [02/Dec/2022:23:43:54 +0000] "GET /healthz HTTP/1.1" 200  0 "-" "kube-probe/1.24" "-"
172.17.0.1 - - [02/Dec/2022:23:43:59 +0000] "GET /healthz HTTP/1.1" 200  0 "-" "kube-probe/1.24" "-"

Can someone help me understand the issue?

Well bottom line that is not a standard NGINX proxy log line... so it is not parsing... in fact I am not sure of the format.

217.138.222.101 - - [11/Feb/2022:13:22:11 +0000] "GET /favicon.ico HTTP/1.1" 404 3650 "http://135.181.110.245/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.87 Safari/537.36" "-"

To me you have a Hard tab / wrong spacing instead of a single space between the Code and The bytes the 200 and 0 and you are missing the user agent data completely

So if I take your log clean up the spacing and add the user agent data then it is a standard ngnix log...

172.17.0.1 - - [02/Dec/2022:23:43:49 +0000] "GET /healthz HTTP/1.1" 200 0 "kube-probe/1.24" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.87 Safari/537.36" "-"

So you will need to config your ingress controller to format its output in standard formats... lots of articles on that....here and here

Then it parses to (ignore the time stamp error that will be fixed when it comes from filebeat.

Or you will need to adjust / create your own parsing and edit the existing pipeline or create your own.

   {
      "doc": {
        "_index": "_index",
        "_id": "_id",
        "_version": "3",
        "_source": {
          "nginx": {
            "access": {
              "time": "02/Dec/2022:23:43:49 +0000",
              "remote_ip_list": [
                "172.17.0.1"
              ]
            }
          },
          "_tmp": {},
          "http": {
            "request": {
              "referrer": "kube-probe/1.24",
              "method": "GET"
            },
            "response": {
              "status_code": 200,
              "body": {
                "bytes": 0
              }
            },
            "version": "1.1"
          },
          "source": {
            "address": "172.17.0.1",
            "ip": "172.17.0.1"
          },
          "event": {
            "ingested": "2022-12-03T05:11:44.964567327Z",
            "original": """172.17.0.1 - - [02/Dec/2022:23:43:49 +0000] "GET /healthz HTTP/1.1" 200 0 "kube-probe/1.24" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.87 Safari/537.36" "-"""
          },
          "error": {
            "message": "field [@timestamp] not present as part of path [@timestamp]" <!--- Ignore this
          },
          "user_agent": {
            "original": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.87 Safari/537.36"
          },
          "url": {
            "path": "/healthz",
            "original": "/healthz",
            "scheme": null,
            "domain": null
          }
        },
        "_ingest": {
          "timestamp": "2022-12-03T05:11:44.964567327Z"
        }
      }
    }

Hey Stephend, thx for the reply.

I ended up solving this issue with the following config:

              - condition:
                  equals:
                    kubernetes.deployment.name: nginx-ingress-controller
                config:
                - module: nginx
                  access:
                    enabled: true
                    input:
                      type: container
                      containers.ids:
                        - "${data.kubernetes.container.id}"
                      paths:
                      - /var/lib/docker/containers/${data.kubernetes.container.id}/*.log
                      prospector:
                        document_type: nginx_access
                        json.message_key: "log"
                        json.keys_under_root: true

From the tests I ran so far it is working.

Now sure exactly how and why (it is a bit of a blackbox for me), but thx anyway.

Can I ask for your help with another log?

I have an app that outpul logs: loglevel ;date time; long message.
Example:

DEBUG   ; 2022-11-29 05:24:11;   bla bla bla..

So I guess I can use processors like so:

processors:

  • dissect:
    tokenizer: '"%{log.level} ;%{+DATE}; %{msg}"'
    field: "message"
    target_prefix: ""

my questions are:

  • How do I make the DATE be the current timestamp of the log.
  • sometimes the last part (msg) is multiline, now if the - was in the beginning I could have used multiline:
                    multiline:
                      pattern: '^[0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2}'
                      negate: true
                      match: after

But how can I use it when the log start with the log.level ?

Hi @amir_Bialek

Glad you got the nginx to work... the config looks good...

Next Time please open separate thread for new topics

For the timestamp look at the timestamp processor

The layout is a bit tricky but should be something like...
Also all these times will be represented in GMT unless provide a time zone

processors:
  - timestamp:
      field: DATE
      layouts:
        - '2006-01-02 15:04:05'
      test:
        - '2022-11-29 05:24:11'
  - drop_fields:
      fields: [DATE]

You should open a separate thread on the Multi-line and give it a proper title and give samples of the data ...

Hello @stephenb , many thx for your help and guidance!

I have not yet found a solution to catch the multilines that start with log level, however for the logs that start with date I have the code, example: 2022-12-05 23:19:22 INFO bla bla bla..
I am adding the solution here in case someone else will need it:

      filebeat.autodiscover:
        providers:
          - type: kubernetes
            hints.enabled: false
            templates:
              - condition.or:
                  - contains:
                      kubernetes.pod.name: commands        
                config:
                  - type: container
                    paths:
                      - /var/log/containers/*.log
                    multiline:
                      pattern: '^[0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2}'
                      negate: true
                      match: after
                    processors:
                    - dissect:
                        # tokenizer syntax: https://www.elastic.co/guide/en/logstash/current/plugins-filters-dissect.html.
                        tokenizer: "%{+timestamp} %{+timestamp} %{log.level} %{?message}"
                        # https://www.elastic.co/guide/en/beats/filebeat/master/dissect.html
                        field: "message"
                        target_prefix: ""

This work perfectly and it catch the multiline, so I will try to convince my team to change the log structure to start with , which should be a standard.

perhaps if you provide some sample logs someone might be able to help....

I would add it is a bit unusual to start a log line with log.level much more common / best practice is the timestamp.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.