Logstash filtering. Extract data between two strings

Igor_Olikh · October 7, 2020, 1:26pm

I have a UNIX log looks like:

ACTION started Lorem Ipsum is simply dummy text of the printing and typesetting industry.
Lorem Ipsum has been the industry's standard dummy text ever since the 1500s finished
when an unknown printer took a galley of type started and scrambled it to make a type specimen book. finished It has survived not only five centuries started but also the leap into electronic typesetting, remaining essentially unchanged started but also the leap into electronic typesetting, remaining essentially unchanged finished

I have to extract all the data between:

started and finished words (ex. and scrambled it to make a type specimen book)
if no finished for the started then between started and the next started (ex. but also the leap into electronic typesetting, remaining essentially unchanged)

Thanks.

Badger · October 7, 2020, 3:00pm

Maybe

 grok { match => { "message" => "started%{DATA:someText}(finished|started)" } }

Igor_Olikh · October 7, 2020, 3:53pm

Thank you Badger, I'll check it later.

Igor_Olikh · October 8, 2020, 5:57am

Hi Badger,
What is the meaning of "someText" in this case? Could you please explain?

Igor_Olikh · October 8, 2020, 1:47pm

I tried the grok. It seems working but if there are multiple rows between started and ended words, Logstash creates for each row a different document in the index.
I need the text between these two words to be in the one document.
Maybe the setting in input plugin are incorrect?

Badger · October 8, 2020, 2:34pm

Then you would have to use a multiline codec on the input or the multiline option in filebeat (if applicable) to combine them.

Igor_Olikh · October 8, 2020, 3:04pm

Thank you!

system · November 5, 2020, 3:04pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.