@costin The Performance guide suggests that the final batch size = batch_size * #of tasks ("Thus for a job with 5 tasks, using the defaults (1mb or 1000 docs) means up to 5mb/5000 docs bulk size"). Could you please explain?
From looking at the code, it seems like EsRDDWriter.write is called for every task, and creates it's own instance of a RestService. Where are batches shared across tasks? Also, does creating a RestService for each task (as opposed to 1 per JVM) impact performance?
I only briefly looked at the code, so I may be completely off. Would really appreciate your help understanding this