Testing Grok / Mutations on existing data?

devployment · April 17, 2018, 12:35pm

Hello,

I'm new to ELK. So I'm not 100% where this message fits exactly. It's partly Logstash and partly Elastic I guess.

What I currently have. I'm running the ELK stack locally for testing purposes. A Logstash plugin is reading emails by IMAP. Everything works as expected for the initial steps.

What I'd like to do now is to test Grok (Logstash) and html_strip (Elastic) with these messages. Those Emails contain a lot of not structured information. So it will take a lot of effort to get the data extraction right. Read: Many, many cycles of try and error I guess.

What is the best workflow for this?

Delete all the local data and get messages from IMAP again?

It would be great if I could pick a log entry, and just "replay" it. But for my understanding, this might not be possible as the original message is already processed by Logstash and ended up in Elastic.

Is there a way to "replay" a log, so that it gets replaced by the newly processed log?

guyboertje · April 17, 2018, 1:16pm

There is no way to replay a log.

You can use the elasticsearch input to read from the index you have and put it in a second index and/or stdout output with rubydebug codec to quickly see if your changes were effective. Turn on config reloading and use a long schedule, hours, so the docs are not fetched while you are editing the config. As you save the config, Logstash should restart you pipeline and fetch the ES docs for each config change. Tune the ES input query to fetch a few docs at first and maybe only ones with a known format at first then expand the query as the grok patterns emerge.

Or you can use the file output with JSON codec to write an unaltered copy of the data taken from IMAP.

Then you can use the file input to re-read the file as you iterate on the grok development. Make sure you set the sincedb path to "/dev/nul" so the file will be read from the beginning each time. Use config reloading here too.

devployment · April 17, 2018, 1:34pm

Good. Seems that my basic understanding of the mechanics is correct at this time.

I think I'm giving the JSON file output approach a try. Seems easy enough.

Thanks a lot for the fast input and showing me a path through the forest.

guyboertje · April 17, 2018, 2:31pm

Personally, I would advocate for the ES input option as the query allows for selecting a subset of the events while you modify the grok patterns.

The file input option is a all you can eat solution.

Good luck.

system · May 15, 2018, 2:31pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Run logs through logstash again Logstash	3	910	July 6, 2017
Logstash Parsing Logstash	2	229	July 13, 2018
Como guardan logstash y elasticsearch Elastic en Español	2	1897	July 6, 2017
Is it possible apply grok in oldest logs? Logstash	8	714	July 26, 2017
Batch processing Logstash	3	1406	April 5, 2018

Testing Grok / Mutations on existing data?

Related topics