Hello.
I seem to have hit a problem in which Spark writing to Elasticsearch is very slow and it takes quite a lot of time (around 15 mins) in making the initial connection, during which both Spark and Elasticsearch remain idle.
There is another thread highlighting the same issue but it has been closed without any solution.
This is how I am writing from Spark to ES:
vgDF.write.format("org.elasticsearch.spark.sql").mode('append').option("es.resource", "demoindex/type1").option("es.nodes", "*ES IP*").save()
Spark specifications are as under
Spark 2.1.0
3 cpu x 10 gb ram x 6 executors
running on 3 gce nodesSpark 2.1.0
Elasticsearch specifications:
8 cpu * 30 gb RAM single node
Versions:
Elasticsearch: 6.2.2
ES-Hadoop: 6.2.2
Even after this 15 mins period, the ingestion rate is quite slow. It took around 45 mins (in total) to write only 961 rows from Spark to ES.
For your information, Spark reads data from Cassandra DB, process the results (but this process is quite fast, takes around 1 - 2 mins) and then writes to Elasticsearch.
Any help would be greatly appreciated
Best,
Waleed