Hi, I have a question about parallelism when saving an RDD to Elasticsearch. I have an RDD (created with SparkSQL) with 1000 partitions, and an Elasticsearch index with 5 primary shards. I run my application on a Spark cluster with 3 executors. However, I only see one task (running on one executo…

Did you ever get to the bottom of this issue? I am seeing the same thing

Hi @Pat_Humphreys , see this answer: [image] Performance degradation when writing to AWS elasticsearch using elasticsearch-hadoop library Elasticsearch @larghir The situation with keys being shuffled to one reducer is primarily a MapReduce case. A Spark RDD will write out …

saveToEs Write performance (elasticsearch-spark)

Elastic Stack Elasticsearch

larghir September 22, 2016, 11:25am 3

Hi @Pat_Humphreys,

see this answer:

I ended up using saveToEsWithMeta, configuration including at least values for es.batch.size.entries, es.batch.size.bytes, es.batch.write.retry.count.

Topic		Replies	Views
Performance degradation when writing to AWS elasticsearch using elasticsearch-hadoop library Elasticsearch es-hadoop	5	2113	July 28, 2016
Spark uses one ES node at a time to write to elastic search Elasticsearch es-hadoop	3	1876	October 11, 2017
Throttling indexing to Elasticsearch in Spark Elasticsearch es-hadoop	9	2194	February 3, 2017
Difference between task creation for a write and read-update-write operation in ES Elasticsearch es-hadoop	2	1484	May 16, 2015
Spark + Elastic search write performance issue Elasticsearch es-hadoop	1	2576	October 31, 2017

saveToEs Write performance (elasticsearch-spark)

Related topics