We are currently using the Bulk API to insert documents on initial load of a new index. When it comes to updates and keeping the index in sync with our database, should we still be using the Bulk API or should we just use Update?
I have read a fair bit about this and there seems to be some contradiction. Obviously with the update api approach you are going to get a lot more requests to the cluster, however due to replication and the refresh interval being set the Bulk Api might not be the best approach either.
One thing I was considering was queuing our updates until a certain threshold was meant (or if the threshold isnt met then a certain time elapses) then use the Bulk api, but wanted to see if there are any other approaches/best practices.
Yeah currently we are using the Bulk API for updates as well. We are currently finding that we have CPU spikes when we are creating and populating a new index while at the same time keeping the current 'live' index up to date, which after a while causes the elasticsearch node to go red. One option I was looking at was reducing the number of records in the sync process, however like I say I have read some blogs that say once the refresh interval and replicas are set you shouldnt use the Bulk API anymore.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.