Force logstash+FILEBEAT to reindex csv file

Aralex · January 29, 2019, 6:14pm

Hi,
I'm trying to do the same as in this post:

except a big difference: the solution in the post above is using only logstash, while my pipeline ships data using filebeat to logstash. The file input plugin of logstash has the necessary parameters (sincedb_path and ignore_older). The csv filter of logstash doesn't have these. I checked the documentation, it seems like the beats input of logstash doesn't have these parameters either.
What can I do here?...

Thank you.

kvch · January 30, 2019, 9:02am

So if I understood correctly you would like to read a CSV file using Filebeat. Then forward the events to Logstash and parse it.

You can use the log input of Filebeat to read the logs. Set the output to Logstash. Then Logstash can read from its beats input and parse the CSV it has received.
See more about log: https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-input-log.html

To force Filebeat to reread files it has already encountered, you need to delete the appropiate entries from its registry file. This file is located under your data folder.

Aralex · January 30, 2019, 11:06am

Hi kvch, thank you for the reply.

My pipeline is already up and running nicely: filebeat monitors a folder for all csv files, ships data to logstash where data is parsed and filtered, then sent to ES.

When an entry in a csv file has its value modified, I want the pipeline to reread it and reindex it in ES. I read everything I found on reindexing. The problen is, as far as I found this is a highly manual task done through the dev section in kibana using PUT.
I don't prefer doing it this way, nor deleting entries in the registry. Once a csv entry is modified I need the changes to be reflected in my kibana dashboard. This is critical for my work.

kvch · February 1, 2019, 9:04am

I am afraid that it is a use case we are not supporting. Filebeat is not able to detect whether a line in a file has been modified or not. It only reads each line once (unless the entries from the registry are deleted, in this case the whole file is reread and resent).

Aralex · February 7, 2019, 8:41am

There has to be a way. Do you intend to do something about this situation in a (near) future release?
We need to use filebeat, filter data in logstash using the csv filter, and have a method to rewrite/reindex entries upon any entry changes in the csv.
Please help, any additional info is highly appreciated

kvch · February 7, 2019, 10:36am

We are not planning to support this use case. The architecture of Filebeat reader pipeline does not support processing files in this manner. So even if we decide to support it (which I doubt), it is going to be a massive undertaking, as the core of Filebeat is was not designed in this way.

kvch · February 7, 2019, 10:37am

You could try opening an enhancement request on Github so it is recorded somewhere. But it might get rejected there.

Aralex · February 7, 2019, 10:42am

ok, I'll try to do that.
I'm sure someone else at some point faced the same scenario and had the same question in mind.
This actually became a new requirement in our pipeline, which is why we went with filebeat and not directly with logstash.
Thank you for the reply. I'll see if there's a workaround and post any updates.

system · March 7, 2019, 10:42am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Force logstash to reindex csv files Logstash	4	2304	July 6, 2017
Force filebeat to reparse CSV Beats filebeat	4	598	July 14, 2020
Force logstash to reindex a CSV Logstash	4	587	July 26, 2020
Filebeat+CSV Beats filebeat	4	1099	January 16, 2019
Filebeat loading whole csv again, if new entries are added Beats filebeat	6	321	May 25, 2021

Force logstash+FILEBEAT to reindex csv file

Related topics