Filebeat multiline filter and unknown escape character


(Wayne Hunter) #1

I'm configuring filebeat to multiline any line not containing a date in 3 formats as shown below in the configuration snippet.

    # Date with hyphen seperator.
    pattern: "^(19|20)\d\d([- /.])(0[1-9]|1[012])\2(0[1-9]|[12][0-9]|3[01])"
    negate: true
    match: after
    
    # Date without seperator.
    pattern: "^(19|20)\d\d(0[1-9]|1[012])(0[1-9]|[12][0-9]|3[01])"
    negate: true
    match: after

    # Syslog date.
    pattern: "^(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec) \d\d"
    negate: true
    match: after

This results in the following message:
Loading config file error: YAML config parsing failed on C:/Programs/elastic/filebeat-1.1.0-windows/cdc.yml: yaml: line 138: found unknown escape character. Exiting.

If I remove the double quotes from the first regular expression, the error jumps to the second expression:
Loading config file error: YAML config parsing failed on C:/Programs/elastic/filebeat-1.1.0-windows/cdc.yml: yaml: line 143: found unknown escape character. Exiting.

And so on, until I remve from the final expression and get the following:
2016/02/03 13:51:44.614181 geolite.go:61: INFO Loaded GeoIP data from: C:/Programs/elastic/filebeat-1.1.0-windows/GeoLiteCity.dat
2016/02/03 13:51:44.615181 logstash.go:106: INFO Max Retries set to: 3
2016/02/03 13:51:44.623181 outputs.go:119: INFO Activated logstash as output plugin.
2016/02/03 13:51:44.623181 publish.go:288: INFO Publisher name: NODE-1
2016/02/03 13:51:44.629182 async.go:78: INFO Flush Interval set to: 1s
2016/02/03 13:51:44.630182 async.go:84: INFO Max Bulk Size set to: 2048
2016/02/03 13:51:44.630182 beat.go:147: INFO Init Beat: filebeat; Version: 1.1.0
2016/02/03 13:51:44.632182 beat.go:173: INFO filebeat sucessfully setup. Start running.
2016/02/03 13:51:44.632182 registrar.go:66: INFO Registry file set to: C:\ProgramData\filebeat\registry
2016/02/03 13:51:44.632182 registrar.go:76: INFO Loading registrar data from C:\ProgramData\filebeat\registry
2016/02/03 13:51:44.633182 prospector.go:127: INFO Set ignore_older duration to 8760h0m0s
2016/02/03 13:51:44.633182 prospector.go:127: INFO Set scan_frequency duration to 10s
2016/02/03 13:51:44.633182 prospector.go:87: INFO Input type set to: log
2016/02/03 13:51:44.633182 prospector.go:127: INFO Set backoff duration to 1s
2016/02/03 13:51:44.633182 prospector.go:127: INFO Set max_backoff duration to 10s
2016/02/03 13:51:44.633182 prospector.go:107: INFO force_close_file is disabled
2016/02/03 13:51:44.633182 prospector.go:137: INFO Starting prospector of type: log
2016/02/03 13:51:44.634182 spooler.go:77: INFO Starting spooler: spool_size: 2048; idle_timeout: 5s
2016/02/03 13:51:44.634182 crawler.go:78: INFO All prospectors initialised with 0 states to persist
2016/02/03 13:51:44.634182 registrar.go:83: INFO Starting Registrar
2016/02/03 13:51:44.634182 log.go:113: INFO Harvester started for file: C:/Programs/elastic/logs/artifactory/server.log
2016/02/03 13:51:44.634182 log.go:135: ERR Stop Harvesting. Unexpected encoding line reader error: error parsing regexp: invalid escape sequence: \d

I know all the expressions are good as they have been tested elsewhere. Does anyone know how to overcome this?

Thanks,


(ruflin) #2

The first thing which is probably not related to the this error here is that filebeat only supports one pattern per prospector. It seems like you want to use all 3 in one prospector or is that just an excerpt?

We currently use POSIX regexp but are thinking about moving away from it: https://github.com/elastic/beats/issues/912

Have a look at this issue here which could help find the issue with \d https://github.com/elastic/beats/issues/740


(Wayne Hunter) #3

Thanks, I've added another slash, however multiline just won't work for me.

Here's my configuration:

filebeat:
  prospectors:
    -
      paths: 
        - C:/Programs/elastic/logs/artifactory/server.log
      document_type: artifactory-server
      ignore_older: 17520h
      multiline:
        pattern: ^(19|20)\\d\\d([- /.])(0[1-9]|1[012])\\2(0[1-9]|[12][0-9]|3[01])
        negate: true
        match: after      
output:
  logstash:
    enabled: true
    hosts: ["localhost:5044"]

And here's a log snippet:

2015-03-02 18:25:16,499 [art-init] [INFO ] (o.a.w.s.ArtifactoryContextConfigListener:283) -

Version: 3.5.2.1
Revision: 30160
Artifactory Home: '/var/opt/jfrog/artifactory'

2015-03-02 18:25:16,813 [art-init] [INFO ] (o.a.s.ArtifactoryApplicationContext:211) - Refreshing Artifactory: startup date [Mon Mar 02 18:25:16 CET 2015]; root of context hierarchy

Filebeat rolls up all the lines into a single event irrespective of whether the line begins with a date or not.


(ruflin) #4

Could you try a simpler example to just see that it works? For example just pick all lines that start with number?


(Wayne Hunter) #5

I whittled the expression to:

    pattern: ^[0-9]+

And now the behaviour is as expected. However is there plans to use the same engine as Logstash so that GROK expressions and pattern files may be used?


(Steffen Siering) #6

yaml has quite some escaping rules on strings depending on how strings are written in file (there a like 5 different rules). I'd recommend to put regexes in single quotes.

Here is a small script to test regexes on some content. Adapt pattern, negate and content to check your regexes matching.


(Maxim Gueivandov) #7

Yes, it seems like the Filebeat's Multiline feature concatenates all lines in the file into one single message if the pattern has no matches. So you'd need to double-check your regular expression.


(Jerry Hoffmeister) #8

Wondering if there's any guidance for including \d for example in a regex in the filebeat multiline pattern? I have the same issue and can work around it same as above but would be nice to not have to.


(Steffen Siering) #9

You can find some minimal support about regex support in our docs.


(Jerry Hoffmeister) #10

Thanks, I see \d and most of the other Perl character classes aren't supported. I swear I looked at that doc last week...


(Chen Augustin) #11

I strongly agree the filebeat multiline use the same engine as Logstash. So that the regex pattern in the multiline codec can be used in the filebeat multiline. @ruflin @steffens


(system) #12