Performance degradation when writing to AWS elasticsearch using elasticsearch-hadoop library

james.baiera · July 28, 2016, 6:55pm

@larghir The situation with keys being shuffled to one reducer is primarily a MapReduce case. A Spark RDD will write out to Elasticsearch in parallel using which ever number of partitions are configured. Writing parallelism does also depend on your RDD layout, your configuration, and the available resources in your environment.

Topic		Replies	Views
Difference between task creation for a write and read-update-write operation in ES Elasticsearch es-hadoop	3	1446	July 6, 2017
saveToEs Write performance (elasticsearch-spark) Elasticsearch es-hadoop	3	2735	July 6, 2017
Elasticsearch & hadoop Elasticsearch es-hadoop	7	1083	September 11, 2017
Relationship between Spark tasks and batch size Elasticsearch es-hadoop	3	4040	July 6, 2017
Throttle the ES-Hadoop write speed Elasticsearch es-hadoop	3	629	September 29, 2020

Performance degradation when writing to AWS elasticsearch using elasticsearch-hadoop library

Related topics