I am using the es-hadoop library to write data to AWS elasticsearch through EMR. The input data is around 4 GB and the input split size is 64MB. I am using a map only job to write to ESOutputFormat. I can see that es-hadoop creates 28 reduce tasks but only the first reduce task is actually writing to the elasticsearch-index. My index has number of shards set to 5 which is the default. Since there is just 1 reduce task that is actually writing to AWS ES, it takes a long time to complete. How does es-hadoop come up with the number of map and reduce tasks based on the hadoop input splits and elasticsearch index shards? Also, is this due to the performance degradation when turning off node discovery in WAN networks like AWS elasticsearch.