Running into a problem caused by our developers not using consistent data/time format.
Jul 06, 2017 6:24:58 PM org.apache.catalina.startup.HostConfig undeploy
INFO: Undeploying context 2017-07-06 18:24:59.103 [ContainerBackgroundProcessor[StandardEngine[Catalina]]] WARN o.s.b.f.s.DisposableBeanAdapter - Invocation of destroy method failed on bean with name 'ehcache': java.lang.NoClassDefFoundError: net/sf/ehcache/store/chm/SelectableConcurrentHashMap$KeySet Jul 06, 2017 6:24:59 PM org.apache.catalina.loader.WebappClassLoaderBase clearReferencesThreads
SEVERE: The web application appears to have started a thread named [Statistics Thread] but has failed to stop it. This is very likely to create a memory leak.
Is there any "or" functionality in the multiline? If not, our filebeat deployment is probably DOA. I need to be able to match EITHER date/time format to detect the end of multiline entries.
For this use-case I'd use an or-expression (expr1)|(expr2).
Fortunately this is still a simple use-case, but feel free to open an enhancement request to configure multiple multiline state-machines per log-file (in case things become more complicated).
Using the pattern above, I am not seeing data in Kibana that makes sense. Near as I can tell, the RegEX will indeed detect the date strings, but are wildcards required to match the rest of the line? Tried that, and almost works, but seeing splits in odd places. Pulling from the log file directory, I find this message: When I check in Kibana, the section in bold is where the message begins, and it ends in 5 lines, not on the next date field, but on just before the word "Caused: " which doesnt match the filter at all.
2017-07-07 03:11:04.865 [http-apr-8080-exec-147] [Ar0zklkfR9Kjh9tb7DLpEw] DEBUG c.r.a.s.jersey.log.LoggingFilter - WebRequestLog{httpMethod=POST, url=http://oep63-sigma-s1-core-int.samsmk.net:8080/a
pi/v2/authentication/sa, protocol=http, request={ "userGuid":"cknv1piopp", "deviceIdentifiers":{"imei":"99000697260243", "modelCode":"",
"macAddress":"",
@
/AprEndpoint$SocketProcessor
at com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:180) [guice-servlet-4.0-beta5.jar:na]
at com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91) [guice-servlet-4.0-beta5.jar:na]
at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:85) [guice-servlet-4.0-beta5.jar:na]
at com.rosetta.adk.hibernate.filter.MultiPersistFilter.doFilter(MultiPersistFilter.java:75) [adk-hibernate-1.0.26.jar:na]
at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:82) [guice-servlet-4.0-beta5.jar:na]
at com.yammer.metrics.web.WebappMetricsFilter.doFilter(WebappMetricsFilter.java:76) [metrics-web-2.2.0.jar:na]
at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:82) [guice-servlet-4.0-beta5.jar:na]
at com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:119) [guice-servlet-4.0-beta5.jar:na]
at com.google.inject.servlet.GuiceFilter$1.call(GuiceFilter.java:133) [guice-servlet-4.0-beta5.jar:na]
at com.google.inject.servlet.GuiceFilter$1.call(GuiceFilter.java:130) [guice-servlet-4.0-beta5.jar:na]
at com.google.inject.servlet.GuiceFilter$Context.call(GuiceFilter.java:203) [guice-servlet-4.0-beta5.jar:na]
at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:130) [guice-servlet-4.0-beta5.jar:na]
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241) [catalina.jar:7.0.65]
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208) [catalina.jar:7.0.65]
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:220) [catalina.jar:7.0.65]
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:122) [catalina.jar:7.0.65]
at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:505) [catalina.jar:7.0.65]
at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:956) [catalina.jar:7.0.65]
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:170) [catalina.jar:7.0.65]
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:103) [catalina.jar:7.0.65]
at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:956) [catalina.jar:7.0.65]
at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116) [catalina.jar:7.0.65]
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:423) [catalina.jar:7.0.65]
at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1079) [tomcat-coyote.jar:7.0.65]
at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:625) [tomcat-coyote.jar:7.0.65]
at org.apache.tomcat.util.net.AprEndpoint$SocketProcessor.doRun(AprEndpoint.java:2522) [tomcat-coyote.jar:7.0.65] at org.apache.tomcat.util.net.AprEndpoint$SocketProcessor.run(AprEndpoint.java:2511) [tomcat-coyote.jar:7.0.65]
** at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_20]**
** at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_20]**
** at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61) [tomcat-coyote.jar:7.0.65]**
** at java.lang.Thread.run(Thread.java:745) [na:1.8.0_20]**
Caused by: com.rosetta.samsmk.oep.exception.application.DeviceNotFoundException: Invalid imei
at com.rosetta.samsmk.oep.registration.service.NahqDeviceValid
Your regex looks somewhat complicated to me... Well, regexes do always look complicated...
No need to be to strict in the regex. Don't try to parse the date, but capture the 'shape.
Plus I'm not sure the ^ operator used is really active for all patterns.
One can simulate variables in beats configuration files via (I didn't test this, hope it works ;)):
filebeat.prospectors:
- type: log
...
multiline.pattern: '^${patterns.timestamp}'
patterns:
timestamp: '((${patterns.timestamp1})|(${patterns.timestamp2}))'
# capture dates of type '2017-01-01 01:02:03.456'
timestamp1: '\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}.\d{3}'
# captures dates of type 'Jan 01, 2017 1:23:45 AM' or 'Jan 01, 2017 11:23:45 PM'
timestamp2: '[JFMAMSOND][a-z]{2} \d{2}, \d{4} \d\d?:\d{2}:\d{2} [AP]m'
With things becoming more complicated and devs potentially adding more quirks, you might consider some automated testing support:
have corpus of logs
have corresponding json file with actual events
We do this for filebeat modules. Every module has a defined test corpus (e.g. apache access module) we do run as part of our system tests (would be cool if filebeat would have a script to run these kind of tests automatically). The tests use the file output in filebeat, to load and compare the events with the expected output.
Having automated tests, gives you a chance to capture breakages in case of you having to adapt/refine the regular expression.
Not sure I do understand the line splitting issue. But we've seen similar issues with events being split in multiline with log4j RollingFileAppender. Issues have been caused by the Appender buffering content without (or too long) flush timeouts (buffer is only flushed once full). The multiline state machine in filebeat has a flush timeout, sending events if it's waiting for too long. Reason is, the state machine can not tell if file/event is over when no more content is appended to the file. You can disable the timeout by setting multiline.timeout: 0. If the multiline event is the last event in the log file, the event will be send once filebeat finally closes the log file.
Actually, I like your suggestion. My regex is actually a single regex for each date format separated by an "or" character. Defining each data format as a separate regex in a variable would make it easier to read. May end up making a library of regex's that can be called that way. Thanks for your advice!
btw. you can use the -c setting to load multiple configuration files. This way you can provide a set of predefined regex patterns in a separate file...
Having a list of predefined patterns in an extra file is actually what grok/logstash is doing. Having to type ${pattern.<name>} and such is not very convenient. Not sure how this will look in beats (I'd prefer not to have to ship to many files with a beat), but maybe we can define a default namespace to look at for regexp patterns.
Then my sample (assuming default namespace being pattern) would become:
filebeat.prospectors:
- type: log
...
multiline.pattern: '^${timestamp}'
patterns:
timestamp: '(${timestamp1}|${timestamp2})'
# capture dates of type '2017-01-01 01:02:03.456'
timestamp1: '(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}.\d{3})'
# captures dates of type 'Jan 01, 2017 1:23:45 AM' or 'Jan 01, 2017 11:23:45 PM'
timestamp2: '([JFMAMSOND][a-z]{2} \d{2}, \d{4} \d\d?:\d{2}:\d{2} [AP]m)'
Next step would be to ship a predefined (builtin) set of patterns with beats, though.
I have no idea how a exactly this kind of feature will look like in beats. But feel free to open an enhancement request for further discussions.
While convenient, the patterns generated this way are often not 'optimal' + in loads of multiline cases I've seen overkill (bad for performance).
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.