When to use the Bulk API

Hi

We are currently using the Bulk API to insert documents on initial load of a new index. When it comes to updates and keeping the index in sync with our database, should we still be using the Bulk API or should we just use Update?

I have read a fair bit about this and there seems to be some contradiction. Obviously with the update api approach you are going to get a lot more requests to the cluster, however due to replication and the refresh interval being set the Bulk Api might not be the best approach either.

One thing I was considering was queuing our updates until a certain threshold was meant (or if the threshold isnt met then a certain time elapses) then use the Bulk api, but wanted to see if there are any other approaches/best practices.

Thanks in advance

Chris

Have you considered sending updates with the Bulk API?

Hi Russ,

Yeah currently we are using the Bulk API for updates as well. We are currently finding that we have CPU spikes when we are creating and populating a new index while at the same time keeping the current 'live' index up to date, which after a while causes the elasticsearch node to go red. One option I was looking at was reducing the number of records in the sync process, however like I say I have read some blogs that say once the refresh interval and replicas are set you shouldnt use the Bulk API anymore.

Thanks

Chris

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.