Multiline / multipattern


(Jeremy Gooch) #1

Hi,

I'm trying to configure FIlebeat to process a log file where records are mostly spread over multiple lines separated by a blank line but occasionally aren't.

Here's an example:-

2018-07-02T21:10:09.775 Start ProcessXMLMessage
2018-07-02T21:10:09.775 Start ThingXML
2018-07-02T21:10:09.776 Before CommitData. CodeID=AQO2 741392570000103001 User=5001 Date=02072018 Time=210912 Scan=ABC1 Length=0041 Width=0059 Height=0061 Weight=0000000000
2018-07-02T21:10:09.799 End WeightXML. op_Errors=
2018-07-02T21:10:09.802 End ProcessXMLMessage

2018-07-02T21:10:09.923 Start ProcessXMLMessage
2018-07-02T21:10:09.924 Start ThingXML
2018-07-02T21:10:09.926 Before CommitData. CodeID=AHF5 939988635943627001 User=5001 Date=02072018 Time=210841 Scan=ABC1 Length=0006 Width=0018 Height=0021 Weight=0000000000
2018-07-02T21:10:10.071 End WeightXML. op_Errors=
2018-07-02T21:10:10.072 End ProcessXMLMessage

2018-06-30T22:21:58.211 Start ProcessXMLMessage
2018-06-30T22:21:58.212 Start IODXML
2018-06-30T22:21:58.213 IODXML Item=50000003388090
2018-06-30T22:21:58.214 ProcessData  User=170005 ItemNumber=50000003388090 Items=1 FailureCode=00 ProcessedDate=30062018 ProcessedTime=1415
2018-06-30T22:21:58.215 ProcessData ll_ValidFailCode=TRUE  FailureCode=00 ldte_ImpDate=30/06/2018
2018-06-30T22:21:58.215 ProcessData GPS Coordinates=51.754193,0.006483 GPS DoP=1 GPS Date/Time=30/06/2018 14:15:51
2018-06-30T22:21:58.240 ProcessData loop end ItemNumber=50000003388090

2018-07-02T21:10:45.595 Item Number 31015080070677 Does Not Exist (DISCREPSCN1.10      G3101574370677000                      501081910710071009451050DC01                              )
2018-07-02T21:11:09.381 Start ProcessXMLMessage
2018-07-02T21:11:09.383 Start WeightXML
2018-07-02T21:11:09.387 Before CommitScanData. CodeID=ABA14831408400321777001 User=5001 Date=02072018 Time=210936 Scan=ABC1 Length=0000 Width=0000 Height=0000 Weight=0000000000
2018-07-02T21:11:09.422 End WeightXML. op_Errors=
2018-07-02T21:11:09.423 End ProcessXMLMessage

I've successfully grouped multiple lines by grouping lines that start with a number (all of the data is timestamped), using this simple pattern:-

multiline.pattern: '[0-9]'
multiline.negate: false
multiline.match: after

However, the data above is five records, not four. The fifth record is the single line six rows before the end ("Item Number ... Does Not Exist"). This particular message is the only one that appears as a single line and the only one that appears without being surrounded by blank lines.

I've tried a regexp to look for blank lines and the "Does Not Exist" line, but it doesn't seem to work, possibly because I want to discard the blank lines but keep the "Does Not Exist" one:-

(^\r?\n)|(Does Not Exist)

Can anyone point me towards a multiline config that will split this data up as required?

Thanks,

J.


(ruflin) #2

I would have followed the same approach as you did. Can you share what exactly didn't work and what the results were?


(Jeremy Gooch) #3

If I use this config...

filebeat.inputs:
- type: log
  enabled: true
  paths:
    - C:\temp\filebeat-test\*
  multiline.pattern: '(^\r?\n)|(Does Not Exist)'
  multiline.negate: true
  multiline.match: after

filebeat.config.modules:
  # Glob pattern for configuration loading
  path: ${path.config}/modules.d/*.yml
  reload.enabled: false

output.console:
  pretty: true

...then I get two records:-

{
  "@timestamp": "2018-07-13T09:24:25.044Z",
  "@metadata": {
    "beat": "filebeat",
    "type": "doc",
    "version": "6.3.0"
  },
  "prospector": {
    "type": "log"
  },
  "input": {
    "type": "log"
  },
  "beat": {
    "name": "51ITLT898F5B",
    "hostname": "51ITLT898F5B",
    "version": "6.3.0"
  },
  "host": {
    "name": "51ITLT898F5B"
  },
  "source": "C:\\temp\\filebeat-test\\test - Copy (2).txt",
  "offset": 1295,
  "message": "2018-07-02T21:10:09.775 Start ProcessXMLMessage\n2018-07-02T21:10:09.775 Start ThingXML\n2018-07-02T21:10:
09.776 Before CommitData. CodeID=AQO2 741392570000103001 User=5001 Date=02072018 Time=210912 Scan=ABC1 Length=0041 Width
=0059 Height=0061 Weight=0000000000\n2018-07-02T21:10:09.799 End WeightXML. op_Errors=\n2018-07-02T21:10:09.802 End Proc
essXMLMessage\n\n2018-07-02T21:10:09.923 Start ProcessXMLMessage\n2018-07-02T21:10:09.924 Start ThingXML\n2018-07-02T21:
10:09.926 Before CommitData. CodeID=AHF5 939988635943627001 User=5001 Date=02072018 Time=210841 Scan=ABC1 Length=0006 Wi
dth=0018 Height=0021 Weight=0000000000\n2018-07-02T21:10:10.071 End WeightXML. op_Errors=\n2018-07-02T21:10:10.072 End P
rocessXMLMessage\n\n2018-06-30T22:21:58.211 Start ProcessXMLMessage\n2018-06-30T22:21:58.212 Start IODXML\n2018-06-30T22
:21:58.213 IODXML Item=50000003388090\n2018-06-30T22:21:58.214 ProcessData  User=170005 ItemNumber=50000003388090 Items=
1 FailureCode=00 ProcessedDate=30062018 ProcessedTime=1415\n2018-06-30T22:21:58.215 ProcessData ll_ValidFailCode=TRUE  F
ailureCode=00 ldte_ImpDate=30/06/2018\n2018-06-30T22:21:58.215 ProcessData GPS Coordinates=51.754193,0.006483 GPS DoP=1
GPS Date/Time=30/06/2018 14:15:51\n2018-06-30T22:21:58.240 ProcessData loop end ItemNumber=50000003388090\n"
}
{
  "@timestamp": "2018-07-13T09:24:25.045Z",
  "@metadata": {
    "beat": "filebeat",
    "type": "doc",
    "version": "6.3.0"
  },
  "source": "C:\\temp\\filebeat-test\\test - Copy (2).txt",
  "offset": 1801,
  "message": "2018-07-02T21:10:45.595 Item Number 31015080070677 Does Not Exist (DISCREPSCN1.10      G3101574370677000
                    501081910710071009451050DC01                              )\n2018-07-02T21:11:09.381 Start ProcessXM
LMessage\n2018-07-02T21:11:09.383 Start WeightXML\n2018-07-02T21:11:09.387 Before CommitScanData. CodeID=ABA148314084003
21777001 User=5001 Date=02072018 Time=210936 Scan=ABC1 Length=0000 Width=0000 Height=0000 Weight=0000000000\n2018-07-02T
21:11:09.422 End WeightXML. op_Errors=",
  "prospector": {
    "type": "log"
  },
  "input": {
    "type": "log"
  },
  "beat": {
    "name": "51ITLT898F5B",
    "hostname": "51ITLT898F5B",
    "version": "6.3.0"
  },
  "host": {
    "name": "51ITLT898F5B"
  }
}

If I set multiline.negate: false then I get one record for every line of the file. Except, the blank lines and the last line of the file are removed.


(Jeremy Gooch) #4

[...continuation from previous post]

And if I use this config...

filebeat.inputs:
- type: log
  enabled: true
  paths:
    - C:\temp\filebeat-test\*
  multiline.pattern: '[0-9]'
  multiline.negate: false
  multiline.match: after

filebeat.config.modules:
  # Glob pattern for configuration loading
  path: ${path.config}/modules.d/*.yml
  reload.enabled: false

output.console:
  pretty: true

...then I get this result (the "Does Not Exist" line has been merged with record number 4 and you'll also note that the last line has been omitted).

{
  "@timestamp": "2018-07-13T09:31:36.599Z",
  "@metadata": {
    "beat": "filebeat",
    "type": "doc",
    "version": "6.3.0"
  },
  "source": "C:\\temp\\filebeat-test\\test - Copy (4).txt",
  "offset": 361,
  "message": "2018-07-02T21:10:09.775 Start ProcessXMLMessage\n2018-07-02T21:10:09.775 Start ThingXML\n2018-07-02T21:10:
09.776 Before CommitData. CodeID=AQO2 741392570000103001 User=5001 Date=02072018 Time=210912 Scan=ABC1 Length=0041 Width
=0059 Height=0061 Weight=0000000000\n2018-07-02T21:10:09.799 End WeightXML. op_Errors=\n2018-07-02T21:10:09.802 End Proc
essXMLMessage",
  "prospector": {
    "type": "log"
  },
  "input": {
    "type": "log"
  },
  "beat": {
    "name": "51ITLT898F5B",
    "hostname": "51ITLT898F5B",
    "version": "6.3.0"
  },
  "host": {
    "name": "51ITLT898F5B"
  }
}
{
  "@timestamp": "2018-07-13T09:31:36.599Z",
  "@metadata": {
    "beat": "filebeat",
    "type": "doc",
    "version": "6.3.0"
  },
  "message": "2018-07-02T21:10:09.923 Start ProcessXMLMessage\n2018-07-02T21:10:09.924 Start ThingXML\n2018-07-02T21:10:
09.926 Before CommitData. CodeID=AHF5 939988635943627001 User=5001 Date=02072018 Time=210841 Scan=ABC1 Length=0006 Width
=0018 Height=0021 Weight=0000000000\n2018-07-02T21:10:10.071 End WeightXML. op_Errors=\n2018-07-02T21:10:10.072 End Proc
essXMLMessage",
  "input": {
    "type": "log"
  },
  "prospector": {
    "type": "log"
  },
  "beat": {
    "hostname": "51ITLT898F5B",
    "version": "6.3.0",
    "name": "51ITLT898F5B"
  },
  "host": {
    "name": "51ITLT898F5B"
  },
  "source": "C:\\temp\\filebeat-test\\test - Copy (4).txt",
  "offset": 724
}
{
  "@timestamp": "2018-07-13T09:31:36.599Z",
  "@metadata": {
    "beat": "filebeat",
    "type": "doc",
    "version": "6.3.0"
  },
  "prospector": {
    "type": "log"
  },
  "input": {
    "type": "log"
  },
  "host": {
    "name": "51ITLT898F5B"
  },
  "beat": {
    "name": "51ITLT898F5B",
    "hostname": "51ITLT898F5B",
    "version": "6.3.0"
  },
  "source": "C:\\temp\\filebeat-test\\test - Copy (4).txt",
  "offset": 1293,
  "message": "2018-06-30T22:21:58.211 Start ProcessXMLMessage\n2018-06-30T22:21:58.212 Start IODXML\n2018-06-30T22:21:58
.213 IODXML Item=50000003388090\n2018-06-30T22:21:58.214 ProcessData  User=170005 ItemNumber=50000003388090 Items=1 Fail
ureCode=00 ProcessedDate=30062018 ProcessedTime=1415\n2018-06-30T22:21:58.215 ProcessData ll_ValidFailCode=TRUE  Failure
Code=00 ldte_ImpDate=30/06/2018\n2018-06-30T22:21:58.215 ProcessData GPS Coordinates=51.754193,0.006483 GPS DoP=1 GPS Da
te/Time=30/06/2018 14:15:51\n2018-06-30T22:21:58.240 ProcessData loop end ItemNumber=50000003388090"
}
{
  "@timestamp": "2018-07-13T09:31:36.599Z",
  "@metadata": {
    "beat": "filebeat",
    "type": "doc",
    "version": "6.3.0"
  },
  "input": {
    "type": "log"
  },
  "beat": {
    "hostname": "51ITLT898F5B",
    "version": "6.3.0",
    "name": "51ITLT898F5B"
  },
  "host": {
    "name": "51ITLT898F5B"
  },
  "source": "C:\\temp\\filebeat-test\\test - Copy (4).txt",
  "offset": 1801,
  "message": "2018-07-02T21:10:45.595 Item Number 31015080070677 Does Not Exist (DISCREPSCN1.10      G3101574370677000
                    501081910710071009451050DC01                              )\n2018-07-02T21:11:09.381 Start ProcessXM
LMessage\n2018-07-02T21:11:09.383 Start WeightXML\n2018-07-02T21:11:09.387 Before CommitScanData. CodeID=ABA148314084003
21777001 User=5001 Date=02072018 Time=210936 Scan=ABC1 Length=0000 Width=0000 Height=0000 Weight=0000000000\n2018-07-02T
21:11:09.422 End WeightXML. op_Errors=",
  "prospector": {
    "type": "log"
  }
}

Thanks!


(ruflin) #5

I'm thinking potentially something is off with the empty line check as if the newline is already removed when you try to match it.

We have a playground for the regexp here that tells you which lines it matches and which ones not: https://www.elastic.co/guide/en/beats/filebeat/6.3/_test_your_regexp_pattern_for_multiline.html Have a look.


(Jeremy Gooch) #6

Hi,

Thanks again for the response. I had a brainwave and solved this with a much easier approach. I'd been looking at the blank lines being the discriminator. I suddenly realised that it would be more straightforward to split by the line "Start ProcessXMLMessage", which appears everywhere except on the "Does Not Exist" line.

So, I just added both of these to the regexp.

multiline.pattern: '(Start ProcessXMLMessage)|(Does Not Exist)|(Closing log file)'
multiline.negate: true
multiline.match: after

NB I also added "Closing log file" in order to catch the last record of the file (as I'd spotted that the final group of messages weren't being captured). To be fair, my example above didn't include the end of the file, so I'm just mentioning it here to explain the missing last line comment I made above.

If the above hadn't worked, I realised that I could also use Logstash to add another layer of splitting - so do the blank line splitting in Filebeat and then process the difficult "Does Not Exist" case in Logstash.

J.


(ruflin) #7

Glad you found a solution and thanks for sharing it.


(system) #8

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.