I am looping through each record in elasticsearch and inserting the new records and updating the existing records. Say for example, i have a csv file which has 5000 records, only few are updated in the index and the rest is missed. And my script to insert and update runs for every 5 mins. Is there a way to insert all the records without missing it?
I am inserting the data as a dataframe using python.
The number of rows in the csv file is not the same in the index after the insertion.
No, i dont refresh the index
You are using the same _id for some documents so some documents are updated
There are errors for some documents but you are not looking at them
You are calling _search immediately after the last index operation and because you are not refreshing the index manually, the last batch of documents is not searchable yet.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.