How many updates is too many updates, or should I bulk?

I have a single node running a Digital Ocean VM. The server's got 8GB and according to top the CPU is about 50% used up by that java process.

My app writes often. Mostly inserts and updates. I have a piece of synchronous Django code that sends over every Django database save over to a PUT on Elasticsearch. It does, on median, 10 Elasticsearch insert/updates and deletes per 60 seconds. Is that a lot?

At what point is it worth writing a buffer than stores all Elasticsearch writes into a single bulk operation?

Well, if your CPU is pegged and you are seeing frequent GCs, too many updates could be the problem. Having been bitten by this before recently, I would argue for always batching your updates or figuring out how not to do updates at all.

For inserts, I've found batching does squeeze more performance out of the system, but for me it's been ~10-25%.

But the real answer is that it depends, so benchmark it and try both approaches to see what fits your particular needs.

I have no idea how to benchmark something like that. I have some graphs that give a rough line of CPU usage. I can only really hope that it drops if the writes are changed to all be done in bulk.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.