I have a question about parallelism when saving an RDD to Elasticsearch.
I have an RDD (created with SparkSQL) with 1000 partitions, and an Elasticsearch index with 5 primary shards. I run my application on a Spark cluster with 3 executors.
However, I only see one task (running on one executor) when calling saveToEs, though I would expect it to write in parallel.
What is going wrong there?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.