TL;DR
Is Java Bulk api useful when I'm already implementing Java's Transport client?
For REST, I understand bulk can reduce load by saving on rest request overhead.
For Java, the transport client already maintains an open connection to the ES node, so what is the advantage of using java's bulk api over individual requests?
My cluster is on v2.1, I don't see Java bulk api documentation for version beyond 1.5, are we moving away from it as it's indifferent to individual requests?
If anything, it actually bundles up ES tasks to be queued, hence increasing the volatility(S.D) on # of tasks queued/second and thus more likely to get EsRejectionException because queue is filled.
Yes.
For Java, the transport client already maintains an open connection to the ES node, so what is the advantage of using java's bulk api over individual requests?
Networking requests are cheap.
That's not the biggest gain though. In 2.x, by default Elasticsearch does an fsync
per request. Of course, these fsync
s hurt performance. But if you bundle a bunch of indexing operations into a single bulk request, you only pay the cost of the fsync
once and thus amortize the cost of the fsync
over multiple operations. This does make a tremendous difference to performance.
What is more, multiple in-flight bulk requests can share an fsync
. That is, if request A and request B arrive and complete and then a call to fsync
is made from the completion of A, the fsync
that would have been executed because of the completion of B is skipped.
The docs are available.
Indexing requests get queued too, and I'm really not sure what you're trying to say about volatility?