Keeping duplicates

Hi,

Is there anyway to keep duplicate rows as they are and maybe include a number which differentiates them. The duplicate records aren't necessarily duplicates so much as they are multiple instances of the same item. They are all needed. This is for CSV files.

logstash will retain duplicate rows. elasticsearch will too unless you force them to have the same document id.

Well the thing is, if no document_id is set then it's generated by elastic, however, the next time those logstash config files are run it will just duplicate the data by adding another new id to each row. However, the duplicates already exists and are valid because they are just multiple instances of the same object. There is no field/column that differentiates it though. So would there be a way of including I guess a field to the id that would separate all of the multiple instances and then change those if next time there are more or less of those objects.

Or what if i could add a field to take into account the number of occurrences for those duplicates. Then when the logstash file runs again would it just be possible to overwrite that number should the number of duplicates increase or lower?

I do not see a solution in that case.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.