Parsing repeated patterns in logstash

Continuing the discussion from Parsing repeated patterns:

In my log file i have the following pattern:
2014-02-19 19:13:12|oAmwic|20140219190348_rmzXR|20140219190348_rmzXR|ADR_TST|Dim_State_Trade|Default|6|Java Exception|tMSSqlOutput_1|java.sql.BatchUpdateException:

2014-02-19 19:13:12|oAmwic|20140219190348_rmzXR|20140219190348_rmzXR|916|ADR_TST|Dim_State_Trade|__a8TIEepEeG9yIseFOxTIA|0.1|Default||end|failure|80029
java.lang.RuntimeException: Child job running failed

I have defined the filter as below:

grok {
break_on_match => false
match => ["message", ["java[.]%{GREEDYDATA}[.]%{GREEDYDATA:error_desc}Exception[:]"]]
add_tag => ["%{error_desc}"]


Only the first pattern is captured.

The second error, though of the same pattern is not captured.

Correct. That is because the 2nd line does not end in "Exception:" and only the first line will match

Actually both of them end with Exception:

java.lang.RuntimeException: Child job running failed

This is the full log

2014-02-19 19:13:12|oAmwic|20140219190348_rmzXR|20140219190348_rmzXR|ADR_TST|Dim_State_Trade|Default|6|Java Exception|tMSSqlOutput_1|java.sql.BatchUpdateException:Transaction (Process ID 57) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.|1
2014-02-19 19:13:14|oAmwic|20140219190348_rmzXR|20140219190348_rmzXR|916|ADR_TST|Dim_State_Trade|__a8TIEepEeG9yIseFOxTIA|0.1|Default||end|failure|80029
java.lang.RuntimeException: Child job running failed
at adr_tst.schedule_dimension_2_1.Schedule_Dimension.tRunJob_11Process(

Job ENDED WITH ERROR at 2014/02/19 19:13:12 (jobId=20140216_190624_s2aZn, jobExecutionId=20140219190348_rmzXR)

Looks like nobody has an answer... does logstash have these limitations ?

Don't use two GREEDYDATA expressions on the same line. Depending on how your expression is written the first one can easily gobble up everything the second one would've matched. Why aren't you using the csv filter to parse this?

Thankyou Magnus!

I will definitely try the CSV filter and keep you posted.

There are various online grok parsers out there that you can use to test your patterns against. I have used yours and it does not match. Even reading your expression doesn't make sense to me. I would suggest using the online parsers so that you can get a sense of how parsing works. You can also learn how to use many other grok expressions. I agree that you should only use Greedydata as a last resort. I have been parsing a long time, and I have used 5+ Greedydata expressions or more in my filter, but it comes down to understanding your log file, does it match the lines that you need to and is there other expressions that would work.

I personally have moved away from CSV filter as I have found a few issues processing bulk data that has blobs of text in them. Simply switching to the Grok filter worked for me. Using the separator in between the patterns was functionally equivalent for me instead of the CSV filter with a separator character.

I agree with Magnus, watch your usage of Greedydata, and use the online parsers for validation of your expressions. If you do this, you will probably never post another parser question here :slight_smile: