I'm new to ELK. So I'm not 100% where this message fits exactly. It's partly Logstash and partly Elastic I guess.
What I currently have. I'm running the ELK stack locally for testing purposes. A Logstash plugin is reading emails by IMAP. Everything works as expected for the initial steps.
What I'd like to do now is to test Grok (Logstash) and html_strip (Elastic) with these messages. Those Emails contain a lot of not structured information. So it will take a lot of effort to get the data extraction right. Read: Many, many cycles of try and error I guess.
What is the best workflow for this?
Delete all the local data and get messages from IMAP again?
It would be great if I could pick a log entry, and just "replay" it. But for my understanding, this might not be possible as the original message is already processed by Logstash and ended up in Elastic.
Is there a way to "replay" a log, so that it gets replaced by the newly processed log?
You can use the elasticsearch input to read from the index you have and put it in a second index and/or stdout output with rubydebug codec to quickly see if your changes were effective. Turn on config reloading and use a long schedule, hours, so the docs are not fetched while you are editing the config. As you save the config, Logstash should restart you pipeline and fetch the ES docs for each config change. Tune the ES input query to fetch a few docs at first and maybe only ones with a known format at first then expand the query as the grok patterns emerge.
Or you can use the file output with JSON codec to write an unaltered copy of the data taken from IMAP.
Then you can use the file input to re-read the file as you iterate on the grok development. Make sure you set the sincedb path to "/dev/nul" so the file will be read from the beginning each time. Use config reloading here too.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.