Ingest pipe line grok pattern with field name having spaces

Bo_Koralage · December 30, 2020, 1:48am

I am using a ingest pipeline to parse a tab separated log message coming from filebeat. One of the fields can have spaces. In the example below "Gui Process" should be parsed to the SourceName field. However, what happens is "Gui" gets mapped to the sourceName and my "Process" gets mapped to logType. I tried doing a custom regex (?[^)]+)\s+- instead of WORD for sourceName but didn't help. Seems like something very simple. Any help would be great. I also tried Disect but couldn't get it to work with tabs as well.

Log line:
2020-12-23T00:00:02.183-08:00 7520977794441 0x000a ABC.Laptop. Gui Process Information GDIObjects: 2078, USERHandles: 5826

Grok Pattern:
%{TIMESTAMP_ISO8601:timestamp}%{SPACE}%{NUMBER:relativeTime}%{SPACE}%{WORD:thread}%{SPACE}%{HOSTNAME:processName}%{SPACE}%{WORD:sourceName}%{SPACE}%{WORD:logType}%{SPACE}%{GREEDYDATA:message}

Expected
timestamp: 2020-12-23T00:00:02.183-08:00
relativeTime: 7520977794441
thread: 0x000a
processName: ABC.Laptop.
sourceName: Gui Process
logType: Information
message: GDIObjects: 2078, USERHandles: 5826

but get
timestamp: 2020-12-23T00:00:02.183-08:00
relativeTime: 7520977794441
thread: 0x000a
processName: ABC.Laptop.
sourceName: Gui Process
logType: Process
message: Information\tGDIObjects: 2078, USERHandles: 5826

pjanzen · December 30, 2020, 7:36am

Hi,

Doing it with dissect is not very hard.

filter {
   dissect {
     mapping => {
       "message" => "%{timestamp} %{relativeTime} %{thread} %{proccessName} %{sourceName} %{+sourceName} %{logType} %{message}"
     }
   }
}

Will give you this result.

{
         "message" => "GDIObjects: 2078, USERHandles: 5826",
         "logType" => "Information",
    "relativeTime" => "7520977794441",
      "sourceName" => "Gui Process",
        "@version" => "1",
       "timestamp" => "2020-12-23T00:00:02.183-08:00",
          "thread" => "0x000a",
    "proccessName" => "ABC.Laptop.",
      "@timestamp" => 2020-12-30T07:26:10.581Z
}

However, because you say "One of the fields can have spaces" it makes it much more complicated.

Now, assuming that

2020-12-23T00:00:02.183-08:00 7520977794441 0x000a ABC.Laptop.

and

Information GDIObjects: 2078, USERHandles: 5826

Are always build up the same and the part where you currently have "Gui Process" can differ you could decompose the log event in 3 stage.

Dissect the first part:

%{timestamp} %{relativeTime} %{thread} %{message}

Create a custom regex that groups anything until the word "Information" and store the rest in 'message'.
Something like this:

(?<processName>\w+\s\w+)\s(?<message>.+)

Then dissect the remaining message.

Not sure if this will help but it is a solution to the problem at hand.

Good luck,
Paul.

pjanzen · December 30, 2020, 7:51am

So, I was looking at some grok filter I use my self and I combine regex with grok patterns. I do this like this.

filter {
  grok {
    match => {
      "message" => "%{TIMESTAMP_ISO8601:timestamp} %{NUMBER:relativeTime} %{WORD:thread} %{HOSTNAME:processName} (?<sourceName>\w+\s\w+) %{WORD:logType} %{GREEDYDATA:message}"
    }
  }
}

This gives me the following output.

{
         "logType" => "Information",
        "@version" => "1",
      "@timestamp" => 2020-12-30T07:48:57.042Z,
      "sourceName" => "Gui Process",
       "timestamp" => "2020-12-23T00:00:02.183-08:00",
     "processName" => "ABC.Laptop.",
          "thread" => "0x000a",
    "relativeTime" => "7520977794441",
         "message" => [
        [0] "2020-12-23T00:00:02.183-08:00 7520977794441 0x000a ABC.Laptop. Gui Process Information GDIObjects: 2078, USERHandles: 5826",
        [1] "GDIObjects: 2078, USERHandles: 5826"
    ]
}

Hope this helps as well.

Paul.

Bo_Koralage · December 30, 2020, 7:58am

Perfect exactly what I needed. Thanks a lot for the explanation. Should have posted this two days ago while I was struggling to figure it out. Do you recommend Dissect or Grok? My log lines also can have multiline. Also I came up with a solution using CSV with tab separator. That worked also but I don't think it handles multiline.

pjanzen · December 30, 2020, 8:03am

I prefer dissect, I find it easier to read in the long run. I do not know if it is faster than grok but I like to believe it is

In regard to multiline. I noticed you send you events trough filebeat. You might want to do the multiline stuff there, much easier to configure as you have the events in order as the pass trough filebeat anyway.

Have a look here for multiline examples

Regards,
Paul.

Bo_Koralage · December 30, 2020, 8:42am

Thanks I changed to Dissect and configured filebeat. Btw I did get an error when I tried the grok parser (ELK 7.9.1) but worked fine in Grok debugger. Doesn't matter since I am not using it :). Just FYI

Bo_Koralage · January 8, 2021, 7:48pm

Hi Paul

I actually changed to use the csv processor and \t as the separator. This works great but fails when the message portion has a new line character. I added the following to the filebeat.yml but hasn't helped. Loglines start with a TS like 2020-12-29T08:25:01.971....

Any thoughts?

filebeat.yml
multiline.type: pattern
multiline.pattern: '^20'
multiline.match: after
multiline.negate: true

pipeline def

  "pipeline_tab" : {
    "description" : "tab pattern",
    "processors" : [
      {
        "csv" : {
          "field" : "message",
          "target_fields" : [
            "timestamp",
            "relativeTime",
            "thread",
            "processName",
            "sourceName",
            "logType",
            "logMessage"
          ],
          "separator" : "\t"
        }
      }
    ]
  }

pjanzen · January 11, 2021, 8:04pm

Hi,

Sorry for the delay. Would it be possible to share a couple of examples it would be hard to tell other wise.

Paul.

Bo_Koralage · January 12, 2021, 12:43am

Here you go

2020-12-29T08:25:01.971-08:00	69207946792	0x0017	Tool.	BrooksRobot	EntryExit	Exiting RobotCommLib.GetReferenceStatusCommand  after 100 ms 
2020-12-29T08:25:02.071-08:00	69208046761	0x002f	Tool.	BrooksRobot	EntryExit	Entering RobotCommLib.GetCurrentThetaRZCommand 
2020-12-29T08:25:02.145-08:00	69208120079	0x0013	Tool.	OptoEvents	Background	OptoMessageReceived Enter 
 RxMsg 008,01004040020A0000,0000000A,08:25:01,
2020-12-29T08:25:02.145-08:00	69208120092	0x0013	Tool.	OptoEvents	Background	SendOptoAcknowledgement Enter 
 AckMsg = 008,01004040020A0000,0000000A,08:25:01,
2020-12-29T08:25:02.145-08:00	69208120124	0x0013	Tool.	OptoEvents	Background	OptoEventAcknowledged: eDigitalPointPushEventID
2020-12-29T08:25:02.145-08:00	69208120159	0x0014	Tool.	OptoEvents	Background	OptoEvent received message: 008,01004040020A0000,0000000A,08:25:01,

Line 3 and 4

pjanzen · January 12, 2021, 6:53am

Hi,

Your multiline pattern is not "^20" but "^ " as that is part of the previous line.

With your example, this multiline config works for me.

multiline.pattern: '^ '
multiline.negate: false
multiline.match: after

Bo_Koralage · January 12, 2021, 9:00am

I set it to as below

multiline.type: pattern
multiline.pattern: '^20'
multiline.match: after
multiline.negate: true

because I want to treat all lines starting with 20* to be log lines. That is why I set the multiline.negate: true meaning any line that does not start with 20 should be considered in the previous line. I set multiline.match: after meaning all lines after line starting with 20* should be part of that line. I don't want to necessarily say if a line starts with a blank it is a multiline. If that is the only way to do it I guess I have no choice. Any idea why negate option and ^20 wouldn't work?

system · February 9, 2021, 9:00am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Generate grok custom pattern to message filed Elasticsearch	9	726	July 16, 2020
Grok pattern - string parsed as original log format Kibana ingest-pipeline	1	321	September 24, 2021
DISSECT FILTER, field not always specified Logstash	5	1639	July 10, 2017
New fields not being created and grok help Logstash	6	3259	July 6, 2017
Dissect combined with multiline pattern gives errors Beats filebeat	6	3258	August 5, 2020

Ingest pipe line grok pattern with field name having spaces

Related topics