Filebeat multiline filter and unknown escape character

I'm configuring filebeat to multiline any line not containing a date in 3 formats as shown below in the configuration snippet.

    # Date with hyphen seperator.
    pattern: "^(19|20)\d\d([- /.])(0[1-9]|1[012])\2(0[1-9]|[12][0-9]|3[01])"
    negate: true
    match: after
    
    # Date without seperator.
    pattern: "^(19|20)\d\d(0[1-9]|1[012])(0[1-9]|[12][0-9]|3[01])"
    negate: true
    match: after

    # Syslog date.
    pattern: "^(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec) \d\d"
    negate: true
    match: after

This results in the following message:
Loading config file error: YAML config parsing failed on C:/Programs/elastic/filebeat-1.1.0-windows/cdc.yml: yaml: line 138: found unknown escape character. Exiting.

If I remove the double quotes from the first regular expression, the error jumps to the second expression:
Loading config file error: YAML config parsing failed on C:/Programs/elastic/filebeat-1.1.0-windows/cdc.yml: yaml: line 143: found unknown escape character. Exiting.

And so on, until I remve from the final expression and get the following:
2016/02/03 13:51:44.614181 geolite.go:61: INFO Loaded GeoIP data from: C:/Programs/elastic/filebeat-1.1.0-windows/GeoLiteCity.dat
2016/02/03 13:51:44.615181 logstash.go:106: INFO Max Retries set to: 3
2016/02/03 13:51:44.623181 outputs.go:119: INFO Activated logstash as output plugin.
2016/02/03 13:51:44.623181 publish.go:288: INFO Publisher name: NODE-1
2016/02/03 13:51:44.629182 async.go:78: INFO Flush Interval set to: 1s
2016/02/03 13:51:44.630182 async.go:84: INFO Max Bulk Size set to: 2048
2016/02/03 13:51:44.630182 beat.go:147: INFO Init Beat: filebeat; Version: 1.1.0
2016/02/03 13:51:44.632182 beat.go:173: INFO filebeat sucessfully setup. Start running.
2016/02/03 13:51:44.632182 registrar.go:66: INFO Registry file set to: C:\ProgramData\filebeat\registry
2016/02/03 13:51:44.632182 registrar.go:76: INFO Loading registrar data from C:\ProgramData\filebeat\registry
2016/02/03 13:51:44.633182 prospector.go:127: INFO Set ignore_older duration to 8760h0m0s
2016/02/03 13:51:44.633182 prospector.go:127: INFO Set scan_frequency duration to 10s
2016/02/03 13:51:44.633182 prospector.go:87: INFO Input type set to: log
2016/02/03 13:51:44.633182 prospector.go:127: INFO Set backoff duration to 1s
2016/02/03 13:51:44.633182 prospector.go:127: INFO Set max_backoff duration to 10s
2016/02/03 13:51:44.633182 prospector.go:107: INFO force_close_file is disabled
2016/02/03 13:51:44.633182 prospector.go:137: INFO Starting prospector of type: log
2016/02/03 13:51:44.634182 spooler.go:77: INFO Starting spooler: spool_size: 2048; idle_timeout: 5s
2016/02/03 13:51:44.634182 crawler.go:78: INFO All prospectors initialised with 0 states to persist
2016/02/03 13:51:44.634182 registrar.go:83: INFO Starting Registrar
2016/02/03 13:51:44.634182 log.go:113: INFO Harvester started for file: C:/Programs/elastic/logs/artifactory/server.log
2016/02/03 13:51:44.634182 log.go:135: ERR Stop Harvesting. Unexpected encoding line reader error: error parsing regexp: invalid escape sequence: \d

I know all the expressions are good as they have been tested elsewhere. Does anyone know how to overcome this?

Thanks,

The first thing which is probably not related to the this error here is that filebeat only supports one pattern per prospector. It seems like you want to use all 3 in one prospector or is that just an excerpt?

We currently use POSIX regexp but are thinking about moving away from it: https://github.com/elastic/beats/issues/912

Have a look at this issue here which could help find the issue with \d https://github.com/elastic/beats/issues/740

1 Like

Thanks, I've added another slash, however multiline just won't work for me.

Here's my configuration:

filebeat:
  prospectors:
    -
      paths: 
        - C:/Programs/elastic/logs/artifactory/server.log
      document_type: artifactory-server
      ignore_older: 17520h
      multiline:
        pattern: ^(19|20)\\d\\d([- /.])(0[1-9]|1[012])\\2(0[1-9]|[12][0-9]|3[01])
        negate: true
        match: after      
output:
  logstash:
    enabled: true
    hosts: ["localhost:5044"]

And here's a log snippet:

2015-03-02 18:25:16,499 [art-init] [INFO ] (o.a.w.s.ArtifactoryContextConfigListener:283) -

Version: 3.5.2.1
Revision: 30160
Artifactory Home: '/var/opt/jfrog/artifactory'

2015-03-02 18:25:16,813 [art-init] [INFO ] (o.a.s.ArtifactoryApplicationContext:211) - Refreshing Artifactory: startup date [Mon Mar 02 18:25:16 CET 2015]; root of context hierarchy

Filebeat rolls up all the lines into a single event irrespective of whether the line begins with a date or not.

Could you try a simpler example to just see that it works? For example just pick all lines that start with number?

I whittled the expression to:

    pattern: ^[0-9]+

And now the behaviour is as expected. However is there plans to use the same engine as Logstash so that GROK expressions and pattern files may be used?

1 Like

yaml has quite some escaping rules on strings depending on how strings are written in file (there a like 5 different rules). I'd recommend to put regexes in single quotes.

Here is a small script to test regexes on some content. Adapt pattern, negate and content to check your regexes matching.

1 Like

Yes, it seems like the Filebeat's Multiline feature concatenates all lines in the file into one single message if the pattern has no matches. So you'd need to double-check your regular expression.

Wondering if there's any guidance for including \d for example in a regex in the filebeat multiline pattern? I have the same issue and can work around it same as above but would be nice to not have to.

You can find some minimal support about regex support in our docs.

Thanks, I see \d and most of the other Perl character classes aren't supported. I swear I looked at that doc last week...

I strongly agree the filebeat multiline use the same engine as Logstash. So that the regex pattern in the multiline codec can be used in the filebeat multiline. @ruflin @steffens

1 Like