Ingest Node Pipeline doesn't allow for "\[" pattern

Hi gang,

I'm trying to parse some logs into my index but I always get the GROK error: "Provided Grok expressions do not match field value"

Error Message
Provided Grok expressions do not match field value: [Sep 28 17:19:59 hostname samba: conn[ldap] c[ipv4] s[ipv4] server_id[number][number]: Auth: [LDAP,simple bind/TLS] user [(null)]\\[user_string] at [Mon, 28 Sep 2020 17:19:59.022083 CEST] with [Plaintext] status [outcome] workstation [(null)] remote host [ipv4] mapped to [domain]\\[user]. local host [ipv4]]

Actual Log
[Sep 28 17:19:59 hostname samba: conn[ldap] c[ipv4] s[ipv4] server_id[number][number]: Auth: [LDAP,simple bind/TLS] user [(null)]\\[user_string] at [Mon, 28 Sep 2020 17:19:59.022083 CEST] with [Plaintext] status [outcome] workstation [(null)] remote host [ipv4] mapped to [domain]\[user]. local host [ipv4]]

GROK Filter in Elasticsearch

{
"ignore_missing": true,
"field": "message",
"patterns": [
"%{SYSLOGTIMESTAMP:system.syslog.timestamp} %{SYSLOGHOST:host.hostname} %{WORD:process.name}: conn[%{WORD:service}] c[ipv4:%{IPV4:client.address}:%{BASE10NUM:client.port}] s[ipv4:%{IPV4:source.address}:%{BASE10NUM:source.port}] server_id[%{BASE10NUM}][%{BASE10NUM}]: %{GREEDYDATA}Auth: [LDAP,simple bind/TLS] user [(null)]\[%{GREEDYDATA:user.object}] at [%{WORD:weekday}, %{GREEDYDATA:event.timestamp} %{WORD:timezone}] with [%{WORD}] status [%{WORD:event.outcome}] workstation [(null)] remote host [ipv4:%{IPV4}:%{BASE10NUM}] %{GREEDYDATA} [%{WORD:domain}]\\[%{USERNAME:user.name}]"
],
"description": "Logon"
}

GROK Filter according to https://grokdebug.herokuapp.com

%{SYSLOGTIMESTAMP:system.syslog.timestamp} %{SYSLOGHOST:host.hostname} %{WORD:process.name}: conn[%{WORD:service}] c[ipv4:%{IPV4:client.address}:%{BASE10NUM:client.port}] s[ipv4:%{IPV4:source.address}:%{BASE10NUM:source.port}] server_id[%{BASE10NUM}][%{BASE10NUM}]: %{GREEDYDATA}Auth: [LDAP,simple bind/TLS] user [(null)]\[%{GREEDYDATA:user.object}] at [%{WORD:weekday}, %{GREEDYDATA:event.timestamp} %{WORD:timezone}] with [%{WORD}] status [%{WORD:event.outcome}] workstation [(null)] remote host [ipv4:%{IPV4}:%{BASE10NUM}] %{GREEDYDATA} [%{WORD:domain}]\[%{USERNAME:user.name}]

The problem, I think, comes from the part where it says "[domain]\\[user]"
The log goes [domain]\[user] (single backslash) however when I try to add it like that to my pipeline I get a squiggly line beneath it, marking the expression as invalid.
Screenshot 2020-09-28 at 17.37.16

Adding a second backslash fixes the squiggly line and the pipeline is ready to be saved.
Screenshot 2020-09-28 at 17.37.26

However, this leaves me with a Grok parser failure because for whatever reason the escape character gets interpreted too.
Did I stumble across a bug or is there another way around this?

For the record, I have tried to substitute the backslash part with a %{GREEDYDATA} tag but this led to unwanted results.

Hey

Just took a stab at the first part of your log inside the pipeline simulator within Dev Tools->Console in Kibana 7.9. I also added some real data to the IPV4 and stuff just so it could parse. The \ seems to be an escaping problem as you have to both escape the \'s and the ['s. The portion below parses and works within the pipeline simulator.

 POST _ingest/pipeline/_simulate
{
  "pipeline": {
    "description" : "Samba log grok parsing",
    "processors": [
      {
        "grok": {
          "field": "message",
          "patterns": ["%{SYSLOGTIMESTAMP:system.syslog.timestamp} %{SYSLOGHOST:host.hostname} %{WORD:process.name}: conn\\[%{WORD:service}] c\\[%{IPV4:client.address}:%{BASE10NUM:client.port}] s\\[%{IPV4:source.address}:%{BASE10NUM:source.port}] server_id\\[%{BASE10NUM:sid1}]\\[%{BASE10NUM:sid2}]: Auth: \\[%{WORD:protocol},%{DATA:binding}] user \\[%{NOTSPACE:user.part0}]\\\\\\[%{NOTSPACE:user.object}]"]
        }
      }
    ]
  },
  "docs":[
    {
      "_source": {
        "message": "[Sep 28 17:19:59 hostname samba: conn[ldap] c[120.0.0.1:9807] s[120.0.0.2:80] server_id[12][13]: Auth: [LDAP,simple bind/TLS] user [(null)]\\[user_string] at [Mon, 28 Sep 2020 17:19:59.022083 CEST] with [Plaintext] status [outcome] workstation [(null)] remote host [ipv4] mapped to [domain]\\[user]. local host [ipv4]]"
      }
    }
  ]
}

Oh, and then, I see 'source' in the s[xxxx:yyy] match part. Should the not be Server IPNumber? (like, server and client IP Numbers?)

1 Like

Hi Jesper,

thanks for reaching out. I fixed this issue by going back to logstash pipelines since I am more familiar with those.
I tried your solution just for the fun of it and it worked like a charm.

Correct assumption, I simply removed the actual IP Addresses from the log due to privacy/security concerns.