I use Logstash to filter data from big files like mysqld.log and mongos.log. It receives thousands of entries from these logs and I use grok to filter data and display it in Kibana.
There are many entries that logstash receives that I don't really care about which why is these entries are failed to be parsed by grok and are given the _grokparsefailure. Those entries take a huge amount of data. I thought about having logstash deleting them right away but this would make debugging impossible.
Is it possible to have entries that are tagged _grokparsefailure deleted after 1 day? Thanks ahead!
I think the easiest way to delete them afterwards would be to create a cron job that calls the Delete by query API filtering the data with a term and a range query.
that requires the doc_id. Is it possible to generate a list of files with doc_ids of documents that are tagged __grokparsefailure and are older than a day?
Do you know if the syntax accept wildcards? I tested it on a demo environment and in production the index is different everyday. For example the index for today is filebeat-7.9.0-2020.09.01.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.