Hi,
I am using jdbc connection for fetching data from database.
I have single index. Let say Contacts.
I am able to fetch data from database. But what i want is when i run logstash for second time it should only add data to elasticsearch which are new in database.
Explanation:
Running logstash for first time
ES will have 5 data in 'Contacts' index. Database has 5 data
Running logstash for second time
ES will have 11 data in 'Contacts' index. Database has 6 data
So what is happening is i am having multiple entries of same data. I want only newly added data when running logstash second tme.
This has been discussed several times before, but the idea is to not use Elasticsearch's automatically generated document id but set your own document id based on one or more fields that originate from columns returned from the database query.
The elasticsearch output's document_id option can be used for this, and if your events for example get an id field with (typically) a primary key from the database you can use document_id => "%{id}" to use that id as the document id in ES.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.