While the pattern works (tested in grokconstructor), I wanted inputs on whether anything needs to be changed in it?
Here are the 4 different sample log messages with the conditions that need to be handled:
2019-07-01 11:20:35,539 INFO [Consumer-1] com.foo.webservices.es.handler.Logger - {msg={\"processTime\":1561980034949,\"fieldA\":\"AAAAAAAAAAAA\",\"fieldB\":\"-123456\",\"fieldC\":\"Value_C\",\"fieldD\":false}, errormsg=no record found}
2019-07-01 11:20:36,942 INFO [exec-31] [opt_field=] com.foo.webservices.es.handler.Logger - {abc=foobar, def=barfoo}
2019-07-01 11:20:35,664 INFO [Consumer-2] [opt_field=opt1_13d67663-615f-4689-9af1-3fa556c84067] com.foo.webservices.es.handler.Logger - {msg={\"processTime\":1561980034694,\"fieldA\":\"AAAAAAAAA\",\"fieldB\":\"-567890\"}, hid=host_id}
2019-07-01 11:20:35,664 INFO [Consumer-2][opt_field=opt1_13d67663-615f-4689-9af1-3fa556c84067] com.foo.webservices.es.handler.Logger - {msg={\"processTime\":1561980034694,\"fieldA\":\"AAAAAAAAA\",\"fieldB\":\"-567890\"}, hid=host_id}
1st Msg: Optional field is not present. Pattern should match. 2nd Msg: Optional field present albeit with null value. Pattern should match. 3rd Msg: Optional field is present. Pattern should match. 4th Msg: Optional field is present but does not have a space after thread_name. Pattern should not match.
Can someone provide insights as to whether this pattern is correct and what could be done better, if anything.
With 7.2 I get the match / not match that you say you want. For [opt_field=] it does match but no opt_field field is added to the event. That is as expected.
With 5.6.9 (the closest version I have to yours locally), I get the results that you say you want.
In general, we recommend that grok patterns are anchored (e.g., they begin with a start-of-line anchor ^ or beginning-of-input anchor \A), which allows the pattern to give up faster when it doesn't find a match starting at the beginning; without anchors, a failed match will be attempted again starting with the 2nd character in the line, and again from the third character, etc.
the pattern may be more performant if you used a more restricted pattern than DATA, as the pattern it expands to typically ends up capturing too much, requiring the parser to backtrack; for example, I assume that you would expect the value to not contain a closing square-bracket, so you could define one or more pattern definitions as so:
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.