I currently have an index that I set unique ids and sometimes there is replica data. In my use case, older data is often more accurate than newer data. So I would like to have logstash only insert if the _id currently does not exist in the index. If it exists, the event shouldn't be sent to Elasticsearch and if it doesn't exist it should be sent. Any thoughts on how I can do this?
Set the action option on the Elasticsearch output to "create" -- create : indexes a document, fails if a document by that id already exists in the index. Hopefully logstash does not endlessly retry
What happens specifically is that when I'm running it as a command (not as a service), it will run through it, then terminate the pipeline and start the pipeline all over again continually.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.