Apply grok pattern based on the log file path

Hi Here is my logstash config file

input {
  beats {
    port => 5044
  }
}
output {
  elasticsearch {
    hosts => "http://ip:9200"
    index => "%{type}-%{+YYYY.MM.dd}"
    user => "elastic"
    password => "pwd"
  }
}

I have logs coming from multiple sources using this pipeline. I want to apply grok on file coming from only one source and no pattern at all for others. The files coming from the source on which i want to grok are coming from multiple paths.
So if the log file path consists of "Apache" i want to apply one grok and if it consists of another string i want to apply one grok. and no grok for others. How can that be done?

I Tried adding tag in filebeat config

- type: log
  enabled: true
  paths:
    - 'D:\3DS-apache\Apache\Apache24\logs\3dx_access.log'
  fields:
    type: 3dxp_apachetrace
    3dxp_tag: 3dxp_apache
  fields_under_root: true
- type: log
  enabled: true
  paths:
    - 'path1*'
  fields:
    type: 3dxp_servicetrace
    3dxp_tag: 3dxp_service
  fields_under_root: true

In the logs there is a field called 3dxp_tag

But when i try to use the 3dxp_tag in logstash config nothing seems to work, by default the first grok is getting applied
Here is logstash config

input {
  beats {
    port => 5044
  }
}
filter {
  if [3dxp_tag] == "3dxp_apache"
  {
      grok {
        match => { "message" => ["\[%{HTTPDATE:timestamp}\] \| %{NUMBER:response} \| (?<duration>%{NUMBER} %{WORD}) \| (?<bytes>%{NUMBER} %{WORD}|%{DATA}) \| %{IP:hostip} \| (%{IP:clientip}|%{DATA:clientip}) \| (%{WORD:token}|%{DATA:token}) \| (?<tag1>%{NUMBER} %{WORD}|%{DATA}) \| \"(?<method>%{WORD}) (?<url>%{URIPATHPARAM}) (?:HTTP/%{NUMBER:http_version})\""] }
    }
  }
  else if [3dxp_tag] == "3dxp_service"
  {
      grok {
        match => { "message" => ["\[%{HTTPDATE:timestamp}\] \| %{NUMBER:response} \| (?<duration>%{NUMBER} %{WORD}) \| (?<bytes>%{NUMBER} %{WORD}|%{DATA}) \| %{IP:hostip} \| (%{DATA:clientip}|%{IP:clientip}) \| (%{WORD:token}|%{DATA:token}) \| (?<tag1>%{NUMBER} %{WORD}|%{DATA}) \| \"(?<method>%{WORD}) (?<url>%{URIPATHPARAM}) (?:HTTP/%{NUMBER:http_version})\" \| (%{EMAILADDRESS:loginemail}|%{DATA:loginemail})"] }
    }

  }
}
output {
  elasticsearch {
    hosts => "http://ip:9200"
    index => "%{type}-%{+YYYY.MM.dd}"
    user => "elastic"
    password => "pwd"
  }
}

There is a difference of only one field in both groks, but the second grok isnt getting applied

Hi @Neelam_Zanvar do you have any events in elastic with the value 3dxp_service?
Even if the grok didn't work, you should still have an event with the tag field and a message field containing the entire message, because it couldn't get parsed...

1 Like
- type: log
  enabled: true
  paths:
    - 'path1*' <!---- THAT DOES NOT LOOK RIGHT
  fields:
    type: 3dxp_servicetrace
    3dxp_tag: 3dxp_service
  fields_under_root: true

No i don't have any event in elastic. I didn't really get your question. THe config i shared are the only configurations i have made

But it is taking all the files from that path after i added *,

@Neelam_Zanvar

I think @Anton_H was asking if you have any documents in elastic with 3dxp_tag: 3dxp_service

And my point is

That is not a valid path (does not appear to be valid)

If so you would see documents in elasticsearch with that log.path

oh, okay. Sorry. Yes there are documents with the d tag


and I have given a valid path in the filebeat.yml I have just renamed it as path* in the forum
The path is
'D:\Dss\R2022x*\win_b64\code\tomee\logs\localhost_access_log*'

Can you show the entire JSON for that document? We want to see if there is a grok error tag.

Please paste it in as text not a screenshot!

{
  "_index": "3dxp_servicetrace-2023.05.30",
  "_id": "TchGbYgBCHIATQ4HfQFn",
  "_version": 1,
  "_score": 0,
  "_source": {
    "input": {
      "type": "log"
    },
    "message": "[30/May/2023:15:28:16.620 +0000] | 200 | 2 ms | 14 B | 172.31.40.179 |        172.31.40.179 | - | - | \"GET /3dcomment/monitoring/healthcheck HTTP/1.1\" | -",
    "host": {
      "ip": [
        "fe80::c02d:1110:7191:bb86",
        "172.31.40.179"
      ],
      "name": "EC2AMAZ-JC0BLVK",
      "mac": [
        "02-F0-50-1A-21-52"
      ],
      "os": {
        "type": "windows",
        "kernel": "10.0.17763.3406 (WinBuild.160101.0800)",
        "name": "Windows Server 2019 Datacenter",
        "family": "windows",
        "platform": "windows",
        "version": "10.0",
        "build": "17763.3406"
      },
      "id": "95a82d2c-5ad4-4b34-939b-2ede4078926e",
      "hostname": "EC2AMAZ-JC0BLVK",
      "architecture": "x86_64"
    },
    "log": {
      "file": {
        "path": "D:\\DassaultSystemes\\R2022x\\3DComment\\win_b64\\code\\tomee\\logs\\localhost_access_log..2023-05-30.txt"
      },
      "offset": 44993
    },
    "bytes": "14 B",
    "token": "-",
    "ecs": {
      "version": "8.0.0"
    },
    "duration": "2 ms",
    "remoteip": "172.31.40.179",
    "url": "/3dcomment/monitoring/healthcheck",
    "@version": "1",
    "@timestamp": "2023-05-30T15:28:35.067Z",
    "agent": {
      "ephemeral_id": "5bb38ab9-5228-4a1d-8c2a-05e8af51eb91",
      "type": "filebeat",
      "id": "2b3bda68-ddfc-41d7-98fe-b7004a2d064a",
      "name": "EC2AMAZ-JC0BLVK",
      "version": "8.7.0"
    },
    "cloud": {
      "image": {
        "id": "ami-0629500f751ea2bb5"
      },
      "service": {
        "name": "EC2"
      },
      "region": "ap-south-1",
      "availability_zone": "ap-south-1a",
      "account": {
        "id": "841773524416"
      },
      "provider": "aws",
      "instance": {
        "id": "i-0bf7a20925c81c13a"
      },
      "machine": {
        "type": "t2.2xlarge"
      }
    },
    "tags": [
      "beats_input_codec_plain_applied"
    ],
    "tag1": "-",
    "http_version": "1.1",
    "3dxp_tag": "3dxp_service",
    "type": "3dxp_servicetrace",
    "clientip": "       172.31.40.179",
    "method": "GET",
    "event": {
      "original": "[30/May/2023:15:28:16.620 +0000] | 200 | 2 ms | 14 B | 172.31.40.179 |        172.31.40.179 | - | - | \"GET /3dcomment/monitoring/healthcheck HTTP/1.1\" | -"
    },
    "date": "30/May/2023:15:28:16.620 +0000",
    "response": "200"
  },
  "fields": {
    "date": [
      "30/May/2023:15:28:16.620 +0000"
    ],
    "agent.version.keyword": [
      "8.7.0"
    ],
    "host.architecture.keyword": [
      "x86_64"
    ],
    "remoteip": [
      "172.31.40.179"
    ],
    "cloud.instance.id.keyword": [
      "i-0bf7a20925c81c13a"
    ],
    "host.name.keyword": [
      "EC2AMAZ-JC0BLVK"
    ],
    "bytes.keyword": [
      "14 B"
    ],
    "host.os.build.keyword": [
      "17763.3406"
    ],
    "host.hostname": [
      "EC2AMAZ-JC0BLVK"
    ],
    "type": [
      "3dxp_servicetrace"
    ],
    "host.mac": [
      "02-F0-50-1A-21-52"
    ],
    "cloud.availability_zone": [
      "ap-south-1a"
    ],
    "remoteip.keyword": [
      "172.31.40.179"
    ],
    "response.keyword": [
      "200"
    ],
    "host.ip.keyword": [
      "fe80::c02d:1110:7191:bb86",
      "172.31.40.179"
    ],
    "ecs.version.keyword": [
      "8.0.0"
    ],
    "host.os.version": [
      "10.0"
    ],
    "type.keyword": [
      "3dxp_servicetrace"
    ],
    "clientip": [
      "       172.31.40.179"
    ],
    "date.keyword": [
      "30/May/2023:15:28:16.620 +0000"
    ],
    "host.os.name": [
      "Windows Server 2019 Datacenter"
    ],
    "cloud.account.id.keyword": [
      "841773524416"
    ],
    "host.id.keyword": [
      "95a82d2c-5ad4-4b34-939b-2ede4078926e"
    ],
    "agent.name": [
      "EC2AMAZ-JC0BLVK"
    ],
    "host.name": [
      "EC2AMAZ-JC0BLVK"
    ],
    "host.os.version.keyword": [
      "10.0"
    ],
    "event.original": [
      "[30/May/2023:15:28:16.620 +0000] | 200 | 2 ms | 14 B | 172.31.40.179 |        172.31.40.179 | - | - | \"GET /3dcomment/monitoring/healthcheck HTTP/1.1\" | -"
    ],
    "method": [
      "GET"
    ],
    "cloud.region": [
      "ap-south-1"
    ],
    "host.os.type": [
      "windows"
    ],
    "agent.id.keyword": [
      "2b3bda68-ddfc-41d7-98fe-b7004a2d064a"
    ],
    "http_version": [
      "1.1"
    ],
    "input.type": [
      "log"
    ],
    "@version.keyword": [
      "1"
    ],
    "cloud.service.name.keyword": [
      "EC2"
    ],
    "log.offset": [
      44993
    ],
    "duration.keyword": [
      "2 ms"
    ],
    "tags": [
      "beats_input_codec_plain_applied"
    ],
    "host.architecture": [
      "x86_64"
    ],
    "cloud.provider": [
      "aws"
    ],
    "cloud.machine.type": [
      "t2.2xlarge"
    ],
    "agent.id": [
      "2b3bda68-ddfc-41d7-98fe-b7004a2d064a"
    ],
    "cloud.service.name": [
      "EC2"
    ],
    "ecs.version": [
      "8.0.0"
    ],
    "message.keyword": [
      "[30/May/2023:15:28:16.620 +0000] | 200 | 2 ms | 14 B | 172.31.40.179 |        172.31.40.179 | - | - | \"GET /3dcomment/monitoring/healthcheck HTTP/1.1\" | -"
    ],
    "host.hostname.keyword": [
      "EC2AMAZ-JC0BLVK"
    ],
    "clientip.keyword": [
      "       172.31.40.179"
    ],
    "agent.version": [
      "8.7.0"
    ],
    "host.os.family": [
      "windows"
    ],
    "token.keyword": [
      "-"
    ],
    "cloud.machine.type.keyword": [
      "t2.2xlarge"
    ],
    "3dxp_tag": [
      "3dxp_service"
    ],
    "input.type.keyword": [
      "log"
    ],
    "tags.keyword": [
      "beats_input_codec_plain_applied"
    ],
    "duration": [
      "2 ms"
    ],
    "host.os.build": [
      "17763.3406"
    ],
    "cloud.instance.id": [
      "i-0bf7a20925c81c13a"
    ],
    "host.ip": [
      "fe80::c02d:1110:7191:bb86",
      "172.31.40.179"
    ],
    "agent.type": [
      "filebeat"
    ],
    "host.os.kernel.keyword": [
      "10.0.17763.3406 (WinBuild.160101.0800)"
    ],
    "host.os.kernel": [
      "10.0.17763.3406 (WinBuild.160101.0800)"
    ],
    "@version": [
      "1"
    ],
    "host.os.name.keyword": [
      "Windows Server 2019 Datacenter"
    ],
    "method.keyword": [
      "GET"
    ],
    "host.id": [
      "95a82d2c-5ad4-4b34-939b-2ede4078926e"
    ],
    "http_version.keyword": [
      "1.1"
    ],
    "log.file.path.keyword": [
      "D:\\DassaultSystemes\\R2022x\\3DComment\\win_b64\\code\\tomee\\logs\\localhost_access_log..2023-05-30.txt"
    ],
    "agent.type.keyword": [
      "filebeat"
    ],
    "agent.ephemeral_id.keyword": [
      "5bb38ab9-5228-4a1d-8c2a-05e8af51eb91"
    ],
    "tag1": [
      "-"
    ],
    "cloud.region.keyword": [
      "ap-south-1"
    ],
    "cloud.image.id.keyword": [
      "ami-0629500f751ea2bb5"
    ],
    "host.mac.keyword": [
      "02-F0-50-1A-21-52"
    ],
    "tag1.keyword": [
      "-"
    ],
    "agent.name.keyword": [
      "EC2AMAZ-JC0BLVK"
    ],
    "message": [
      "[30/May/2023:15:28:16.620 +0000] | 200 | 2 ms | 14 B | 172.31.40.179 |        172.31.40.179 | - | - | \"GET /3dcomment/monitoring/healthcheck HTTP/1.1\" | -"
    ],
    "url": [
      "/3dcomment/monitoring/healthcheck"
    ],
    "url.keyword": [
      "/3dcomment/monitoring/healthcheck"
    ],
    "cloud.availability_zone.keyword": [
      "ap-south-1a"
    ],
    "token": [
      "-"
    ],
    "cloud.image.id": [
      "ami-0629500f751ea2bb5"
    ],
    "host.os.family.keyword": [
      "windows"
    ],
    "@timestamp": [
      "2023-05-30T15:28:35.067Z"
    ],
    "host.os.type.keyword": [
      "windows"
    ],
    "host.os.platform": [
      "windows"
    ],
    "cloud.account.id": [
      "841773524416"
    ],
    "host.os.platform.keyword": [
      "windows"
    ],
    "response": [
      "200"
    ],
    "bytes": [
      "14 B"
    ],
    "log.file.path": [
      "D:\\DassaultSystemes\\R2022x\\3DComment\\win_b64\\code\\tomee\\logs\\localhost_access_log..2023-05-30.txt"
    ],
    "cloud.provider.keyword": [
      "aws"
    ],
    "event.original.keyword": [
      "[30/May/2023:15:28:16.620 +0000] | 200 | 2 ms | 14 B | 172.31.40.179 |        172.31.40.179 | - | - | \"GET /3dcomment/monitoring/healthcheck HTTP/1.1\" | -"
    ],
    "agent.ephemeral_id": [
      "5bb38ab9-5228-4a1d-8c2a-05e8af51eb91"
    ],
    "3dxp_tag.keyword": [
      "3dxp_service"
    ]
  }
}

is the syntax of if condition correct?

Yeah We are looking at that... First look it looks correct...

Can you provide the JSON for the other type that is parsed and show us that please

@Neelam_Zanvar WAIT It IS Getting Parsed look the JSON closely

THIS is parsed maybe not correctly but it IS PARSED!

yes it is getting parsed, Thank you For the quick response. It is working as expected. Thank you for your time. I have another query , All the fields getting parsed by the grok are created in text format. For example HTTPDATE:timestamp is getting saved as text, How can i save it in date format using logstash pipeline only?

@Neelam_Zanvar I think this is the same issue we are discussing in your Kibana visualisation- question.
I will try to answer your questions but am lacking time at the moment. Could you please answer the question in regards to the version of Elastic you are using?

its 8.7

For example HTTPDATE:timestamp is getting saved as text, How can i save it in date format using logstash pipeline only?

Use the date plugin for conversion:

  date {
      match => [ "date", "dd/MMM/yyyy:HH:mm:ss.SSS Z" ]
      # timezone => "Asia/Dubai"
      target=> "@timestamp"
 }

Also you data pattern/view should be recreated in Kibana.

can u please show me how can i use it in my existing grok?

 grok {
        match => { "message" => ["\[%{**HTTPDATE:date:date**}\] \| %{NUMBER:response} \| (?<duration>%{NUMBER} %{WORD}) \| (?<bytes>%{NUMBER} %{WORD}|%{DATA}) \| %{IP:remoteip} \| (%{DATA:clientip}|%{IP:clientip}) \| (%{WORD:token}|%{DATA:token}) \| (?<tag1>%{NUMBER} %{WORD}|%{DATA}) \| \"(?<method>%{WORD}) (?<url>%{URIPATHPARAM}) (?:HTTP/%{NUMBER:http_version})\" \| (%{EMAILADDRESS:email}|%{WORD:email}|%{DATA:email})"] }
    }

You cannot directly in grok:
The documentation: For example %{NUMBER:num:int} which converts the num semantic from a string to an integer. Currently the only supported conversions are int and float .

okay. thank you

How can it be done then?