Multiple .csv file data is being added together

I have a wildcard to ingest multiple .csv files, however in the elastic search visualizations, its adding the same column items together instead of skipping repeat entries.

I want to create a data dump where new logs update only new items.

I'm pretty sure I read by default it would do this.
Is there a setting in the filter plugin I need to add?

Can you clarify what you mean by that?

I have user information entries on .csv files being downloaded as reports

I want to be able to have those reports sent automatically into log stash, adding only new records.

currently when I add a new report, it sees all the entries as brand new ones, and adds them together.

so if I have 30,000 items in a column, when I add a near identical report to the log stash folder, it then changes it to 60,000 and so on.

It was to my understanding that log stash could recognize this and only update new entries.

Nope, Logstash is not a state machine and treats everything is a unique event.

You may want to look at creating a unique document ID based on some of the fields. That way if you download another file that has the same entries, Logstash will simply update the existing one in Elasticsearch and not create a new one.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.