When to use the Bulk API

(Chris Ford) #1


We are currently using the Bulk API to insert documents on initial load of a new index. When it comes to updates and keeping the index in sync with our database, should we still be using the Bulk API or should we just use Update?

I have read a fair bit about this and there seems to be some contradiction. Obviously with the update api approach you are going to get a lot more requests to the cluster, however due to replication and the refresh interval being set the Bulk Api might not be the best approach either.

One thing I was considering was queuing our updates until a certain threshold was meant (or if the threshold isnt met then a certain time elapses) then use the Bulk api, but wanted to see if there are any other approaches/best practices.

Thanks in advance


(Russ Cam) #2

Have you considered sending updates with the Bulk API?

(Chris Ford) #3

Hi Russ,

Yeah currently we are using the Bulk API for updates as well. We are currently finding that we have CPU spikes when we are creating and populating a new index while at the same time keeping the current 'live' index up to date, which after a while causes the elasticsearch node to go red. One option I was looking at was reducing the number of records in the sync process, however like I say I have read some blogs that say once the refresh interval and replicas are set you shouldnt use the Bulk API anymore.



(system) #4

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.