Remove duplicate documents that have the same field value

I'm trying to retrieve data from MongoDB and remove documents that have the same values on a specific field.

For example: If the value of the field "device_uuid" of multiple documents are equal, I would like to filter these documents with duplicated fields and remain with a unique document with that field value.

I've been trying to use fingerprint to do that job, but different documents are still being created in elasticsearch, even if the field "device_uuid" are equal. The only case that it works is when all the fields are equal, then the filter is applied.

This is my code:

  filter {
    mutate {
      remove_field => [ "_id" ]
    }
    fingerprint {
      source => [device_uuid]
      target => "fingerprint"
      key => "78787878"
      method => "SHA1"
      concatenate_sources => true
    }
  }
 
  output {
      elasticsearch {
         index => "logstash_test"
         hosts => ["elastic_url_here"]
         document_id => "%{fingerprint}"
      }
 
 
   stdout{
     codec => rubydebug
   }
 }

If you just look at the device_uuid and fingerprint fields does the same device_uuid result in more than one fingerprint?

1 Like

You are absolutely right! The document with that uuid was being overridden but the fingerprint was the same! It was actually working!
:grin:

Maybe I'm missing something here, but when an existing doc id is found, elasticsearch performs an update of the existing event and overwrites it with the new event that is coming in.

So if the original ingested document has field1=Hello and a uuid of 1, and then later you ingest a document where field1 = World which also has a uuid of 1, then you lose the original event that contained Hello.

Is that what you are wanting to accomplish, to always update a given event with the latest version and lose previously ingested data?

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.