Hi All i am trying to index the data from MS SQL server to elastic search as nested documents it's working fine but the challenge is i have to remove the nested documents and main documnet if the is_active column retrieved from SQL server is false. here is my config file
i am trying this approach for doing incremental indexing
Above select query you have mentioned will retrieve only the active records from Database in that case i won't be able to delete the row which already indexed as nested document in elastic search. Hope you able to understand.
If i am right i have to retrieve all the rows both active and inactive rows from database and have to decide it has to be indexed or deleted. Most of the times record will get modified or deleted in database by End user, from UI or DB team at backend. So record should be in sync between DB and elastic search hence i am trying to achieve this using logstash
I have included the model config which will do incremental indexing but it's not for nested document approach, it's direct indexing of all the retrieved rows.
In our case we have to do incremental indexing for above config because scenario is quite different because we are indexing documents as nested using JDBC Streaming. Hope you will be able to understand
Are there any reason why elasticsearch should delete the data when its inactive? If the source keep it and just tagged it as inactive, cant the elasticsearch do the same? I dont think incremental with the same document_id will work. Use upsert instead.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.