Filebeat Grok pattern for access log

Hi, I have an access log for which I am trying to write a Grok pattern but in the filebeat log, I always see "Provided Grok expressions do not match field value:". The log entries look like:

[20/Oct/2023:09:52:33 +0000] 172.28.0.1 TLSv1.3 NONE localhost sha256WithRSAEncryption "GET /index.php?2=2 HTTP/1.1" 24690 12479
[20/Oct/2023:09:52:42 +0000] 172.28.0.1 TLSv1.3 NONE localhost sha256WithRSAEncryption "GET /index.php?hello=hello HTTP/1.1" 24650 5550
[20/Oct/2023:09:53:02 +0000] 172.28.0.1 TLSv1.3 NONE - sha256WithRSAEncryption "-" - 44
[20/Oct/2023:10:29:09 +0000] 172.28.0.1 TLSv1.3 NONE localhost sha256WithRSAEncryption "GET /index.php?hello=hello HTTP/1.1" 24707 7202
[20/Oct/2023:10:29:11 +0000] 172.28.0.1 TLSv1.3 NONE localhost sha256WithRSAEncryption "GET /index.php?hello=hello HTTP/1.1" 24705 4499
[20/Oct/2023:10:29:29 +0000] 172.28.0.1 TLSv1.3 NONE localhost sha256WithRSAEncryption "-" - 160
[20/Oct/2023:10:30:27 +0000] 172.28.0.1 TLSv1.3 NONE - sha256WithRSAEncryption "GET /index.php?x=x HTTP/1.1" 24610 4878
[20/Oct/2023:10:30:47 +0000] 172.28.0.1 TLSv1.3 NONE - sha256WithRSAEncryption "-" - 194

The Grok pattern I am using and which I uploaded to the Elasticsearch server is:

{
  "description": "Apache Access Log Pipeline",
  "processors": [
    {
      "grok": {
        "field": "message",
        "patterns": [
          "\\[%{HTTPDATE:timestamp}\\] %{IP:client_ip} %{DATA:ssl_protocol} %{WORD:ssl_client_verify} %{DATA:ssl_tls_sni} %{DATA:ssl_server_a_sig} %{GREEDYDATA:request} %{NUMBER:bytes_sent} %{NUMBER:request_duration}"
        ]
      }
    },
    {
      "date": {
        "field": "timestamp",
        "target_field": "@timestamp",
        "formats": ["dd/MMM/yyyy:HH:mm:ss Z"]
      }
    },
    {
      "remove": {
        "field": "message"
      }
    }
  ]
}

My filebeat.yml file is:

filebeat.inputs:

- type: log

  id: apache-logs
  enabled: true
  paths:
    - /var/log/apache2/access.log

setup.template.name: "ohs2-index"
setup.template.pattern: "ohs2-index-*"


filebeat.config.modules:
  # Glob pattern for configuration loading
  path: ${path.config}/modules.d/*.yml

  # Set to true to enable config reloading
  reload.enabled: false

setup.dashboards.enabled: true

setup.kibana:

  host: "localhost:5601"

setup.template.settings:
  index.number_of_shards: 1

setup.ilm.enabled: false

# ---------------------------- Elasticsearch Output ----------------------------
output.elasticsearch:
  # Array of hosts to connect to.
  hosts: ["localhost:9200"]

  # Authentication credentials - either API key or username/password.
  username: "elastic"
  password: "xcxcsvxcv"

  index: "ohs2-index-%{+yyyy.MM.dd}"

  pipeline: "ohs-ingest"

processors:
  - add_host_metadata:
      when.not.contains.tags: forwarded
  - add_cloud_metadata: ~
  - add_docker_metadata: ~
  - add_kubernetes_metadata: ~

I think that is all I need to do but this is the first time I have experimented with E, K and filebeats so not sure. Is anyone able to advise what I am doing wrong please?

Thank you

All messages are failing or just some of them? Because I test your grok here with some of your sample messages and it worked for some of them, but failed on others.

The main issue is that on some lines you do not have the value for the bytes_sent, you have a hyphen and your grok expects it to be a number.

You have this in the end of your grok:

%{NUMBER:bytes_sent} %{NUMBER:request_duration}

But not all your messages will end with two numbers, some of them will be something like this:

- 160
- 44
- 194

Those messages will fail, so you need to change %{NUMBER:bytes_sent} to %{DATA:bytes_sent} and later on remove this field if the value is a hyphen to avoid having mapping issues.

Thank you. That makes a lot of sense. I think that some messages are ingested while others fail which is consistent with what you have said.

Another issue I see, is that if I look at the log stream in Kibana, I see

17:18:32.000
failed to find message
17:18:41.000
failed to find message
17:18:44.000
failed to find message

I don't see errors in the filebeat logs anymore, but I don't see my data in Elasticsearch although I am possibly not looking in the correct place. I am slightly confused by the whole index thing - have I set it up properly in filebeat.yml? I haven't done anything additionally in Elasticsearch.

I do not use the Log Stream feature in Kibana, but if I'm not wrong this uses the message field, which you are removing in your Logstash pipeline.

Thank you again. I think I only removed the message field because in examples I have looked at, that appeared to be the thing to do. I can now see my log data in Log Stream.
I changed the bytes_sent field as you suggested to {DATA:bytes_sent} and if rthe value is a hyphen I remove it like below:

  {
      "script": {
        "source": """
          if (ctx.bytes_sent == '-') {
            ctx.remove("bytes_sent");
          }
        """
      }
    }

I wanted to use the 'mutate' processor as I have used this in tutorials but it didn't seem to be available. How do I make it available or is what I have done ok?

mutate is a Logstash filter, it is not available in Elasticsearch Ingest Pipelines.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.