I was wondering how the bulk API works for massive indexing job on a multi node cluster.
For me they are two possibilities:
- The Java client sends documents on a round robin basis to each node of the cluster. Then each node check the ID of each documents and reroute them if necessary to the correct shard.
- The java client computes the shard id from the document id for each document and directly sends the document to the correct node.
If I have a look to the source code, I think the first approach is implemented, but it is kind of weird for me because I naively think that the first approach is more efficient...
Thank you for your answer