Avoid duplicate document in different Indices,Logsatsh

Hello All,

I have a logstash configuration that uses the following in the output block in an attempt to mitigate duplicates in elastic Index.
The data that logstash is fetching from is through perl script running every "X" minutes in server..Ex: every 10 minutes script runs.

This works when logstash sees the same doc in the same index, but since the command that generates the input data doesn't
have a reliable rate at which different documents appear,logstash will sometimes insert duplicates docs in a different day wise index(rollover).
However, when the day date stamp rolls over, and the document still appears, elastic/logstash thinks it's a new doc.
Ideally I need only one document entry in "all" day wise indices with its unique id,Reason: Suppose if there is any update in this document then
this document id should get updtaed and not create new document with same id.
Can this be achieved?...Any suggestion would be helpful.

input {
   exec {
      command => '. ../scripts/viewvolumes/run_viewvolumes.sh'
      schedule => "0 */10 * * * *"
   }
}

filter {
   if [message] =~ "^\{.*\}[\s\S]*$" {
      json {
         source => "message"
         target => "parsed_json"
		 remove_field => "message"
      }
	  
      split {
         field => "[parsed_json][viewVolumesResponse]"
         target => "volume"
         remove_field => [ "parsed_json" ]
      }
	  
   }
   else {
     drop { }
   }
   
   
}

output {
   elasticsearch {
      hosts => "http://abc09appl008.dev.dm01.group.arg:9200"
	  ilm_pattern => "{now/d}-000001"
      ilm_rollover_alias => "tis-monitor-viewvolumes"
	  ilm_policy => "tis-monitor-viewvolumes-policy"
	  doc_as_upsert => true
	  document_id => "%{[volume][volumeName]}"
   } 
}





@ChinigamiHunter Can you help in this?

Thanx,

elasticsearch guarantees that an _id is unique within each shard of an index. With the default routing that guarantee expands to the entire index. However, there is no guarantee across indexes. For indexing purposes each index is independent, I don't know of a way to tell elasticsearch that it should search and purge items from some set of (possibly read-only rolled over) indexes.

The only solution I can think of is to sort the elasticsearch query by date and limit the result size to 1 to get the most recent.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.