Testing Grok / Mutations on existing data?

Hello,

I'm new to ELK. So I'm not 100% where this message fits exactly. It's partly Logstash and partly Elastic I guess.

What I currently have. I'm running the ELK stack locally for testing purposes. A Logstash plugin is reading emails by IMAP. Everything works as expected for the initial steps.

What I'd like to do now is to test Grok (Logstash) and html_strip (Elastic) with these messages. Those Emails contain a lot of not structured information. So it will take a lot of effort to get the data extraction right. Read: Many, many cycles of try and error I guess.

What is the best workflow for this?

Delete all the local data and get messages from IMAP again?

It would be great if I could pick a log entry, and just "replay" it. But for my understanding, this might not be possible as the original message is already processed by Logstash and ended up in Elastic.

Is there a way to "replay" a log, so that it gets replaced by the newly processed log?

There is no way to replay a log.

You can use the elasticsearch input to read from the index you have and put it in a second index and/or stdout output with rubydebug codec to quickly see if your changes were effective. Turn on config reloading and use a long schedule, hours, so the docs are not fetched while you are editing the config. As you save the config, Logstash should restart you pipeline and fetch the ES docs for each config change. Tune the ES input query to fetch a few docs at first and maybe only ones with a known format at first then expand the query as the grok patterns emerge.

Or you can use the file output with JSON codec to write an unaltered copy of the data taken from IMAP.

Then you can use the file input to re-read the file as you iterate on the grok development. Make sure you set the sincedb path to "/dev/nul" so the file will be read from the beginning each time. Use config reloading here too.

1 Like

Good. Seems that my basic understanding of the mechanics is correct at this time.

I think I'm giving the JSON file output approach a try. Seems easy enough.

Thanks a lot for the fast input and showing me a path through the forest. :slight_smile:

Personally, I would advocate for the ES input option as the query allows for selecting a subset of the events while you modify the grok patterns.

The file input option is a all you can eat solution.

Good luck.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.