I am using a ingest pipeline to parse a tab separated log message coming from filebeat. One of the fields can have spaces. In the example below "Gui Process" should be parsed to the SourceName field. However, what happens is "Gui" gets mapped to the sourceName and my "Process" gets mapped to logType. I tried doing a custom regex (?[^)]+)\s+- instead of WORD for sourceName but didn't help. Seems like something very simple. Any help would be great. I also tried Disect but couldn't get it to work with tabs as well.
Log line:
2020-12-23T00:00:02.183-08:00 7520977794441 0x000a ABC.Laptop. Gui Process Information GDIObjects: 2078, USERHandles: 5826
Expected
timestamp: 2020-12-23T00:00:02.183-08:00
relativeTime: 7520977794441
thread: 0x000a
processName: ABC.Laptop.
sourceName: Gui Process
logType: Information
message: GDIObjects: 2078, USERHandles: 5826
but get
timestamp: 2020-12-23T00:00:02.183-08:00
relativeTime: 7520977794441
thread: 0x000a
processName: ABC.Laptop.
sourceName: Gui Process
logType: Process
message: Information\tGDIObjects: 2078, USERHandles: 5826
Perfect exactly what I needed. Thanks a lot for the explanation. Should have posted this two days ago while I was struggling to figure it out. Do you recommend Dissect or Grok? My log lines also can have multiline. Also I came up with a solution using CSV with tab separator. That worked also but I don't think it handles multiline.
I prefer dissect, I find it easier to read in the long run. I do not know if it is faster than grok but I like to believe it is
In regard to multiline. I noticed you send you events trough filebeat. You might want to do the multiline stuff there, much easier to configure as you have the events in order as the pass trough filebeat anyway.
Thanks I changed to Dissect and configured filebeat. Btw I did get an error when I tried the grok parser (ELK 7.9.1) but worked fine in Grok debugger. Doesn't matter since I am not using it :). Just FYI
I actually changed to use the csv processor and \t as the separator. This works great but fails when the message portion has a new line character. I added the following to the filebeat.yml but hasn't helped. Loglines start with a TS like 2020-12-29T08:25:01.971....
Any thoughts?
filebeat.yml
multiline.type: pattern
multiline.pattern: '^20'
multiline.match: after
multiline.negate: true
multiline.type: pattern
multiline.pattern: '^20'
multiline.match: after
multiline.negate: true
because I want to treat all lines starting with 20* to be log lines. That is why I set the multiline.negate: true meaning any line that does not start with 20 should be considered in the previous line. I set multiline.match: after meaning all lines after line starting with 20* should be part of that line. I don't want to necessarily say if a line starts with a blank it is a multiline. If that is the only way to do it I guess I have no choice. Any idea why negate option and ^20 wouldn't work?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.