except a big difference: the solution in the post above is using only logstash, while my pipeline ships data using filebeat to logstash. The file input plugin of logstash has the necessary parameters (sincedb_path and ignore_older). The csv filter of logstash doesn't have these. I checked the documentation, it seems like the beats input of logstash doesn't have these parameters either.
What can I do here?...
To force Filebeat to reread files it has already encountered, you need to delete the appropiate entries from its registry file. This file is located under your data folder.
My pipeline is already up and running nicely: filebeat monitors a folder for all csv files, ships data to logstash where data is parsed and filtered, then sent to ES.
When an entry in a csv file has its value modified, I want the pipeline to reread it and reindex it in ES. I read everything I found on reindexing. The problen is, as far as I found this is a highly manual task done through the dev section in kibana using PUT.
I don't prefer doing it this way, nor deleting entries in the registry. Once a csv entry is modified I need the changes to be reflected in my kibana dashboard. This is critical for my work.
I am afraid that it is a use case we are not supporting. Filebeat is not able to detect whether a line in a file has been modified or not. It only reads each line once (unless the entries from the registry are deleted, in this case the whole file is reread and resent).
There has to be a way. Do you intend to do something about this situation in a (near) future release?
We need to use filebeat, filter data in logstash using the csv filter, and have a method to rewrite/reindex entries upon any entry changes in the csv.
Please help, any additional info is highly appreciated
We are not planning to support this use case. The architecture of Filebeat reader pipeline does not support processing files in this manner. So even if we decide to support it (which I doubt), it is going to be a massive undertaking, as the core of Filebeat is was not designed in this way.
ok, I'll try to do that.
I'm sure someone else at some point faced the same scenario and had the same question in mind.
This actually became a new requirement in our pipeline, which is why we went with filebeat and not directly with logstash.
Thank you for the reply. I'll see if there's a workaround and post any updates.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.