Grok pattern

Hello everyone,

I have an error in my logs.
I have created a grok pattern and added it to the agent policy, but it still doesn't work.
I have tested the grok pattern with just a few fields


<%{NUMBER:syslog_pri}>%{NUMBER:syslog_ver} %{TIMESTAMP_ISO8601:timestamp} %{HOSTNAME:hostname} %{WORD:app_name} 


Provided Grok expressions do not match field value: [<14>1 2025-03-09T16:07:46.493430+01:00 xxx Veeam_MP - - [origin enterpriseId=\"31023\"] [categoryId=0 instanceId=10050 OibID=\"eb5c85e0-1066-40e4-a5d1-ef98ad19a502\" OriginalOibID=\"eb5c85e0-1066-40e4-a5d1-ef98ad19a502\" VmRef=\"VMxa/ADx/x.vmx\" VmName=\"xMS02\" ServerName=\"Unknown VMware VI host\" DateTime=\"03/09/2025 15:07:46\" IsCorrupted=\"False\" Platform=\"0\" StorageSize=\"0\" RepositoryID=\"00000000-0000-0000-0000-000000000000\" IsFull=\"False\" VbrHostName=\"xxx.teknik.local\" VbrVersion=\"12.2.0.334\" Version=\"1\" Description=\"Restore point for VM [x02] has been removed.\"]]


Anyone idea what could be wrong here?

Hi @arcsons

Go to Discover and get the JSON of one of the documents that is not parsing... and provide that here.

This is the problem...

The pattern does not match becuase it is incomplete there are extra fields

If you want to test incrementally, put the %{GREEDYDATA:message_details} at the end... so the rest of the message gets picked up.

<%{NUMBER:syslog_pri}>%{NUMBER:syslog_ver} %{TIMESTAMP_ISO8601:timestamp} %{HOSTNAME:hostname} %{WORD:app_name} %{GREEDYDATA:message_details}

Go to Kibana Dev Tools

Run

GET _ingest/pipeline/logs-vsphere.logs@custom

Provide that here....

Perhaps we can help...

The best way to Debug Ingest Pipeline is using the _simulate which is what we will do...

POST _ingest/pipeline/logs-vsphere.logs@custom/_simulate
{
  "docs": [
    {
      "_source" : 
      {
        <<<YOUR DOC HERE>>>> 
      }
    }
  ]
}

Hi Stephen,
Hope you're doing well!

I tested the Grok pattern, and it worked in the test with all fields. However, now I started from scratch and applied it to the pipeline, it never worked.

POST _ingest/pipeline/logs-vsphere.log@custom/_simulate
{
  "docs": [
    {
      "_source": {
        "_index": ".ds-logs-vsphere.log-default-2025.03.02-000005",
        "_id": "dtPIf5UBYuBgAluqaK6T",
        "_version": 1,
        "_score": 0,
        "_source": {
          "input": {
            "type": "tcp"
          },
          "agent": {
            "name": "x.xyz.local",
            "id": "378ef5d0-3bfe-4304-8394-c96722767888",
            "type": "filebeat",
            "ephemeral_id": "3b51a122-db57-4de0-a2e6-d64c43fdb43b",
            "version": "8.15.3"
          },
          "@timestamp": "2025-03-10T11:19:51.334Z",
          "ecs": {
            "version": "8.11.0"
          },
          "log": {
            "source": {
              "address": "10.10.0.83:58332"
            }
          },
          "data_stream": {
            "namespace": "default",
            "type": "logs",
            "dataset": "vsphere.log"
          },
          "elastic_agent": {
            "id": "378ef5d0-3bfe-4304-8394-c96722767888",
            "version": "8.15.3",
            "snapshot": false
          },
          "host": {
            "hostname": "x.xyz.local",
            "os": {
              "kernel": "6.8.0-47-generic",
              "codename": "noble",
              "name": "Ubuntu",
              "type": "linux",
              "family": "debian",
              "version": "24.04.1 LTS (Noble Numbat)",
              "platform": "ubuntu"
            },
            "containerized": false,
            "ip": [
              "10.10.0.237",
              "fe80::250:56ff:fe9a:25bb"
            ],
            "name": "x.xyz.local",
            "id": "21aa5bdee5e2419cba57751eb5c6887c",
            "mac": [
              "00-50-56-9A-25-BB"
            ],
            "architecture": "x86_64"
          },
          "event": {
            "agent_id_status": "verified",
            "ingested": "2025-03-10T11:19:55Z",
            "dataset": "vsphere.log"
          },
          "error": {
            "message": [
              "Provided Grok expressions do not match field value: [<14>1 2025-03-10T12:19:51.315514+01:00 VEEAMMGMT01 Veeam_MP - - [origin enterpriseId=\\\"31023\\\"] [categoryId=0 instanceId=450 JobSessionID=\\\"01bed5e1-40f7-4ee4-a960-4aaaf2c26b71\\\" JobID=\\\"8f97b7bb-86ca-430b-a159-950f93a21434\\\" JobType=\\\"63\\\" TaskSessionID=\\\"eecb32ca-4481-4807-8be8-8d082c4c8d71\\\" OibID=\\\"1fe7254d-fe6f-42aa-870c-cff1d751daa4\\\" OriginalOibID=\\\"6612eaff-3251-4b59-87eb-311f2ed65012\\\" CreationTime=\\\"03/10/2025 11:15:34\\\" Status=\\\"5\\\" SourceHostName=\\\"vmvc.xyz.local\\\" VmRef=\\\"vm-866847\\\" VmName=\\\"ADFS03\\\" TransferredGb=\\\"22.805\\\" Platform=\\\"0\\\" IsRetry=\\\"False\\\" VbrHostName=\\\"VEEAMMGMT01.xyz.local\\\" VbrVersion=\\\"12.2.0.334\\\" Version=\\\"1\\\" Description=\\\"VM ADFS03 task has finished with 'InProgress' state.\\\"]]"
            ]
          },
          "tags": [
            "vmware-vsphere"
          ]
        },
        "fields": {
          "elastic_agent.version": [
            "8.15.3"
          ],
          "host.os.name.text": [
            "Ubuntu"
          ],
          "host.name.text": [
            "x.xyz.local"
          ],
          "host.hostname": [
            "x.xyz.local"
          ],
          "host.mac": [
            "00-50-56-9A-25-BB"
          ],
          "host.ip": [
            "10.10.0.237",
            "fe80::250:56ff:fe9a:25bb"
          ],
          "agent.type": [
            "filebeat"
          ],
          "event.module": [
            "vsphere"
          ],
          "agent.name.text": [
            "x.xyz.local"
          ],
          "host.os.version": [
            "24.04.1 LTS (Noble Numbat)"
          ],
          "host.os.kernel": [
            "6.8.0-47-generic"
          ],
          "host.os.name": [
            "Ubuntu"
          ],
          "agent.name": [
            "x.xyz.local"
          ],
          "elastic_agent.snapshot": [
            false
          ],
          "host.name": [
            "x.xyz.local"
          ],
          "event.agent_id_status": [
            "verified"
          ],
          "host.id": [
            "21aa5bdee5e2419cba57751eb5c6887c"
          ],
          "host.os.type": [
            "linux"
          ],
          "elastic_agent.id": [
            "378ef5d0-3bfe-4304-8394-c96722767888"
          ],
          "data_stream.namespace": [
            "default"
          ],
          "host.os.codename": [
            "noble"
          ],
          "input.type": [
            "tcp"
          ],
          "data_stream.type": [
            "logs"
          ],
          "tags": [
            "vmware-vsphere"
          ],
          "host.architecture": [
            "x86_64"
          ],
          "event.ingested": [
            "2025-03-10T11:19:55.000Z"
          ],
          "@timestamp": [
            "2025-03-10T11:19:51.334Z"
          ],
          "agent.id": [
            "378ef5d0-3bfe-4304-8394-c96722767888"
          ],
          "ecs.version": [
            "8.11.0"
          ],
          "host.containerized": [
            false
          ],
          "host.os.platform": [
            "ubuntu"
          ],
          "error.message": [
            "Provided Grok expressions do not match field value: [<14>1 2025-03-10T12:19:51.315514+01:00 VEEAMMGMT01 Veeam_MP - - [origin enterpriseId=\\\"31023\\\"] [categoryId=0 instanceId=450 JobSessionID=\\\"01bed5e1-40f7-4ee4-a960-4aaaf2c26b71\\\" JobID=\\\"8f97b7bb-86ca-430b-a159-950f93a21434\\\" JobType=\\\"63\\\" TaskSessionID=\\\"eecb32ca-4481-4807-8be8-8d082c4c8d71\\\" OibID=\\\"1fe7254d-fe6f-42aa-870c-cff1d751daa4\\\" OriginalOibID=\\\"6612eaff-3251-4b59-87eb-311f2ed65012\\\" CreationTime=\\\"03/10/2025 11:15:34\\\" Status=\\\"5\\\" SourceHostName=\\\"vmvc.xyz.local\\\" VmRef=\\\"vm-866847\\\" VmName=\\\"ADFS03\\\" TransferredGb=\\\"22.805\\\" Platform=\\\"0\\\" IsRetry=\\\"False\\\" VbrHostName=\\\"VEEAMMGMT01.xyz.local\\\" VbrVersion=\\\"12.2.0.334\\\" Version=\\\"1\\\" Description=\\\"VM ADFS03 task has finished with 'InProgress' state.\\\"]]"
          ],
          "log.source.address": [
            "10.10.0.83:58332"
          ],
          "data_stream.dataset": [
            "vsphere.log"
          ],
          "agent.ephemeral_id": [
            "3b51a122-db57-4de0-a2e6-d64c43fdb43b"
          ],
          "agent.version": [
            "8.15.3"
          ],
          "host.os.family": [
            "debian"
          ],
          "event.dataset": [
            "vsphere.log"
          ]
        }
      }
    }
  ]
}

That gives me error:

{
  "error": {
    "root_cause": [
      {
        "type": "illegal_argument_exception",
        "reason": "unexpected metadata [_id:dtPIf5UBYuBgAluqaK6T, _index:.ds-logs-vsphere.log-default-2025.03.02-000005, _version:1] in source"
      }
    ],
    "type": "illegal_argument_exception",
    "reason": "unexpected metadata [_id:dtPIf5UBYuBgAluqaK6T, _index:.ds-logs-vsphere.log-default-2025.03.02-000005, _version:1] in source"
  },
  "status": 400
}

Hi @arcsons

Problem is you are putting the Whole Document in not just the "_source" field which is not correct....

You have

POST _ingest/pipeline/logs-vsphere.log@custom/_simulate
{
  "docs": [
    {
      "_source": { <<<< Source Twice
        "_index": ".ds-logs-vsphere.log-default-2025.03.02-000005",  << NOT SOURCE
        "_id": "dtPIf5UBYuBgAluqaK6T", << NOT SOURCE
        "_version": 1, << NOT SOURCE
        "_score": 0, << NOT SOURCE
        "_source": { <<<<< SOURCE STARTS HERE
          "input": {
            "type": "tcp"
          },
          "agent": {
            "name": "x.xyz.local",
......
          },
          "event": {
            "agent_id_status": "verified",
            "ingested": "2025-03-10T11:19:55Z",
            "dataset": "vsphere.log"
          },
          "error": {
            "message": [
              "Provided Grok expressions do not match field value: [<14>1 2025-03-10T12:19:51.315514+01:00 VEEAMMGMT01 Veeam_MP - - [origin enterpriseId=\\\"31023\\\"] [categoryId=0 instanceId=450 JobSessionID=\\\"01bed5e1-40f7-4ee4-a960-4aaaf2c26b71\\\" JobID=\\\"8f97b7bb-86ca-430b-a159-950f93a21434\\\" JobType=\\\"63\\\" TaskSessionID=\\\"eecb32ca-4481-4807-8be8-8d082c4c8d71\\\" OibID=\\\"1fe7254d-fe6f-42aa-870c-cff1d751daa4\\\" OriginalOibID=\\\"6612eaff-3251-4b59-87eb-311f2ed65012\\\" CreationTime=\\\"03/10/2025 11:15:34\\\" Status=\\\"5\\\" SourceHostName=\\\"vmvc.xyz.local\\\" VmRef=\\\"vm-866847\\\" VmName=\\\"ADFS03\\\" TransferredGb=\\\"22.805\\\" Platform=\\\"0\\\" IsRetry=\\\"False\\\" VbrHostName=\\\"VEEAMMGMT01.xyz.local\\\" VbrVersion=\\\"12.2.0.334\\\" Version=\\\"1\\\" Description=\\\"VM ADFS03 task has finished with 'InProgress' state.\\\"]]"
            ]
          },
          "tags": [
            "vmware-vsphere"
          ]
        }, <<<<< ENDS HERE 
        "fields": {. << DONT INCLUDE FIELDS
          "elastic_agent.version": [
            "8.15.3"
         ....
        }
      }
    }
  ]
}

Should look like

POST _ingest/pipeline/logs-vsphere.log@custom/_simulate
{
  "docs": [
    {
      "_source": {
          "input": {
            "type": "tcp"
          },
          "agent": {
            "name": "x.xyz.local"
.....
          "event": {
            "agent_id_status": "verified",
            "ingested": "2025-03-10T11:19:55Z",
            "dataset": "vsphere.log"
          },
          "error": {
            "message": [
              "Provided Grok expressions do not match field value: [<14>1 2025-03-10T12:19:51.315514+01:00 VEEAMMGMT01 Veeam_MP - - [origin enterpriseId=\\\"31023\\\"] [categoryId=0 instanceId=450 JobSessionID=\\\"01bed5e1-40f7-4ee4-a960-4aaaf2c26b71\\\" JobID=\\\"8f97b7bb-86ca-430b-a159-950f93a21434\\\" JobType=\\\"63\\\" TaskSessionID=\\\"eecb32ca-4481-4807-8be8-8d082c4c8d71\\\" OibID=\\\"1fe7254d-fe6f-42aa-870c-cff1d751daa4\\\" OriginalOibID=\\\"6612eaff-3251-4b59-87eb-311f2ed65012\\\" CreationTime=\\\"03/10/2025 11:15:34\\\" Status=\\\"5\\\" SourceHostName=\\\"vmvc.xyz.local\\\" VmRef=\\\"vm-866847\\\" VmName=\\\"ADFS03\\\" TransferredGb=\\\"22.805\\\" Platform=\\\"0\\\" IsRetry=\\\"False\\\" VbrHostName=\\\"VEEAMMGMT01.xyz.local\\\" VbrVersion=\\\"12.2.0.334\\\" Version=\\\"1\\\" Description=\\\"VM ADFS03 task has finished with 'InProgress' state.\\\"]]"
            ]
          },
          "tags": [
            "vmware-vsphere"
          ]
        }
      }
    }
  ]
}

Is the field message or error.message?
Also I wonder if you can see any errors wtih

<%{NUMBER:syslog_pri}>%{NUMBER:syslog_ver} %{TIMESTAMP_ISO8601:timestamp} %{HOSTNAME:hostname} %{WORD:app_name} %{GREEDYDATA:message_details}

It feels like Grok Debbuger tool work different between testing in the pipeline and in Dev Tools.


Hi @arcsons

We can't debug anything with screenshots. They are hard to see and I suggested you use Dev - Tools, which is much better to debug than using the Ingest Pipeline UI in my opinion

So like the last time, let's back up and figure out what you are trying to do.

A) What are the logs you are trying to ingest? vsphere Yes / No
B) If so are you using the vsphere integration? Yes / No
C) It seem like you are trying to do this but the message is not parsing correct? Yes / No
D) What version of the vsphere integration?

Please answer these questions.

If so please turn on Preserve Original Event ... this is the field we will look at to parse

event.original that will contain the original message then and only then can you actually work on your issue. The original message field is copied to event.original but is not preserved unless you select the above.

After have that please provide a sample document it should have the correct data so we can help

Hi Stephen,

I have changed it to logstash to make it easier for troubleshooting.

Logg meassage:

<14>1 2025-03-13T19:13:28.384816+01:00 veeamx Veeam_MP - - [origin enterpriseId="31023"] [categoryId=0 instanceId=10050 OibID="e1c3f571-3331-4812-b340-0ccea8bdc355" OriginalOibID="e1c3f571-3331-4812-b340-0ccea8bdc355" VmRef="vm-1115094" VmName="TESTWIN01" ServerName="vmvc.x.lan" DateTime="03/13/2025 18:13:28" IsCorrupted="False" Platform="0" StorageSize="0" RepositoryID="00000000-0000-0000-0000-000000000000" IsFull="False" VbrHostName="veeamx.x.lan" VbrVersion="12.2.0.334" Version="1" Description="Restore point for VM [TESTWIN01] has been removed."]

Grok pattern works fine in Grok debugger.

<%{INT:priority}\>%{INT:syslog_version} %{TIMESTAMP_ISO8601:timestamp} %{HOSTNAME:host} %{WORD:application} - - \[origin\senterpriseId=\\"%{NUMBER:enterprise_id}\\"\] \[categoryId=%{NUMBER:category_id} instanceId=%{NUMBER:instance_id} JobSessionID=\\"%{DATA:job_session_id}" JobID=\\"%{DATA:job_id}\" JobResult=\\"%{DATA:job_result}\" JobType=\\"%{DATA:job_type}\" Platform=\\"%{DATA:platform}\" WillBeRetried=\\"%{DATA:will_be_retried}\" VbrHostName=\\"%{DATA:vbr_host_name}\" VbrVersion=\\"%{DATA:vbr_version}\" Version=\\"%{DATA:version}\" Description=\\"%{DATA:description}\"\] 
logstash.conf:
     grok {
      match => {
        "message" => [
                   "<%{INT:priority}\>%{INT:syslog_version} %{TIMESTAMP_ISO8601:timestamp} %{HOSTNAME:host} %{WORD:application} - - \[origin\senterpriseId=\\"%{NUMBER:enterprise_id}\\"\] \[categoryId=%{NUMBER:category_id} instanceId=%{NUMBER:instance_id} JobSessionID=\\"%{DATA:job_session_id}" JobID=\\"%{DATA:job_id}\" JobResult=\\"%{DATA:job_result}\" JobType=\\"%{DATA:job_type}\" Platform=\\"%{DATA:platform}\" WillBeRetried=\\"%{DATA:will_be_retried}\" VbrHostName=\\"%{DATA:vbr_host_name}\" VbrVersion=\\"%{DATA:vbr_version}\" Version=\\"%{DATA:version}\" Description=\\"%{DATA:description}\"\]"
        ]
      }
    }

In logstash.conf it doensn't like the syntax.


e Veeam syslog messages\n    grok {\n      match => {\n        \"message\" => [\n                   \"<%{INT:priority}\\>%{INT:syslog_version} %{TIMESTAMP_ISO8601:timestamp} %{HOSTNAME:host} %{WORD:application} - - \\[origin\\senterpriseId=\\\\\"%{NUMBER:enterprise_id}\\\\\"\\] \\[categoryId=%{NUMBER:category_id} instanceId=%{NUMBER:instance_id} JobSessionID=\\\\\"%{DATA:job_session_id}\" ", :backtrace=>["/usr/share/logstash/logstash-core/lib/logstash/compiler.rb:32:in `compile_imperative'", "org/logstash/execution/AbstractPipelineExt.java:294:in `initialize'", "org/logstash/execution/AbstractPipelineExt.java:227:in `initialize'", "/usr/share/logstash/logstash-core/lib/logstash/java_pipeline.rb:47:in `initialize'", "org/jruby/RubyClass.java:949:in `new'", "/usr/share/logstash/logstash-core/lib/logstash/pipeline_action/create.rb:50:in `execute'", "/usr/share/logstash/logstash-core/lib/logstash/agent.rb:420:in `block in converge_state'"]}
[2025-03-13T20:27:02,721][INFO ][logstash.runner          ] Logstash shut down.
[2025-03-13T20:27:02,726][FATAL][org.logstash.Logstash    ] Logstash stopped processing because of an error: (SystemExit) exit
org.jruby.exceptions.SystemExit: (SystemExit) exit
        at org.jruby.RubyKernel.exit(org/jruby/RubyKernel.java:924) ~[jruby.jar:?]
        at org.jruby.RubyKernel.exit(org/jruby/RubyKernel.java:883) ~[jruby.jar:?]
        at usr.share.logstash.lib.bootstrap.environment.<main>(/usr/share/logstash/lib/bootstrap/environment.rb:90) ~[?:?]

I see in the log message that the end of it seems to be a list of key-value pairs. If this is a static list where all entries always appear in the same order you can use grok for this, but I would recommend instead breaking up your parsing in multiple steps.

In the first step you use grok or dissect processor to separate out the full key-value list into a single temporary field. You then use a KV processor to parse the key-value field, possibly with some other proessor if it needs to be altered or cleaned up ahead of KV processing. The KV processor can parse key-value pairs that appear in any order and does not fail if some are missing.

Hi Christian,

Thank you for suggesting new ideas on how to solve this.

I haven’t used any other processor besides Grok. From what I understand, it should be something like this, but it also returned a Grok error: "_grokparsefailure."

Is there anything I might be missing?

grok {
      match => {
        "message" => "<%{INT:syslog_pri}>1 %{TIMESTAMP_ISO8601:timestamp} %{HOSTNAME:hostname} %{DATA:app} - - \\[%{DATA:origin}\\] \\[%{GREEDYDATA:kv_pairs}\\]"
      }
    }


kv {
      source => "kv_pairs"
      field_split => " "  
      value_split => "="  
    }

I am not sure what the exact definition of TIMESTAMP_ISO8601 is so it may be worthwhile to change this to DATA and see if it makes a difference.

In order to troubleshoot these kind of issues I generally recommend adding one field at a time from the start and capture the rest of the message using GREEDYDATA. That allows you to immediately identify what is causing the issue as you iterate through the message.

For this particular case I would also recommend using the dissect processor instead of grok as it for reasonably fixed formats is easier to configure and troubleshoot.