Let's suppose we have a spark cluster with elasticsearch on each node.
We've created a rdd with documents somehow and want to write them to es.
Will the documents be written locally, i.e. if a partition is on node1, all documents from it will be sent to node1? And no documents will be transferred between hosts?
Reading the code (RestService):
//currentSplit is partitionId // check invalid splits (applicable when running in non-MR environments) - in this case fall back to Random.. int selectedNode = (currentSplit < 0) ? new Random().nextInt(nodes.size()) : currentSplit % nodes.size();
This is just a round robin, and the data will be tossed around in network, If I understand this correctly.