Keeping duplicates

edster · August 22, 2019, 1:52pm

Hi,

Is there anyway to keep duplicate rows as they are and maybe include a number which differentiates them. The duplicate records aren't necessarily duplicates so much as they are multiple instances of the same item. They are all needed. This is for CSV files.

Badger · August 22, 2019, 6:11pm

logstash will retain duplicate rows. elasticsearch will too unless you force them to have the same document id.

edster · August 22, 2019, 7:04pm

Well the thing is, if no document_id is set then it's generated by elastic, however, the next time those logstash config files are run it will just duplicate the data by adding another new id to each row. However, the duplicates already exists and are valid because they are just multiple instances of the same object. There is no field/column that differentiates it though. So would there be a way of including I guess a field to the id that would separate all of the multiple instances and then change those if next time there are more or less of those objects.

edster · August 22, 2019, 8:08pm

Or what if i could add a field to take into account the number of occurrences for those duplicates. Then when the logstash file runs again would it just be possible to overwrite that number should the number of duplicates increase or lower?

Badger · August 22, 2019, 8:35pm

I do not see a solution in that case.

system · September 19, 2019, 8:35pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Avoiding duplicate records Logstash	4	2459	November 2, 2017
Preventing duplicates when reading the same data multiple times Logstash	3	699	June 22, 2021
Logstash adding duplicate rows for every run Logstash	11	14653	July 6, 2017
How not to overwrite duplicates? save old documents Logstash	3	814	July 23, 2020
Question on Elastic Search Filtering Logstash	2	315	February 5, 2021

Keeping duplicates

Related topics