I'd like to import addresses into ES using Logstash. The thing is, I'd like to filter out the addresses that already exist in the ES index but I can't find a ES filter plugin that allows me to first check if the address already exists in ES.
You could use the fingerprint filter plugin to generate a fingerprint that you then use as the document id. Used together with the appropriate action in the elasticsearch output, you can then e.g. upsert (insert or update) the existing document, or just keep the first copy indexed.
The problem is (and I forgot that to write, sorry) that the ES addresses are company data and the imported ones are customer data, therefore consistent hashes don't work. That's why I first thought a preceding query in a filter would solve my problem.
I haven't tested this idea yet, but you could e.g. use the elasticsearch filter plugin with a query template that matches your stored addresses, and then drop the new event if a match was found.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.