Spark Connector performance issue [thread contd]

waleed_ali · March 20, 2018, 10:07am

Hello.

I seem to have hit a problem in which Spark writing to Elasticsearch is very slow and it takes quite a lot of time (around 15 mins) in making the initial connection, during which both Spark and Elasticsearch remain idle.
There is another thread highlighting the same issue but it has been closed without any solution.

This is how I am writing from Spark to ES:
vgDF.write.format("org.elasticsearch.spark.sql").mode('append').option("es.resource", "demoindex/type1").option("es.nodes", "*ES IP*").save()

Spark specifications are as under

    Spark 2.1.0
    3 cpu x 10 gb ram x 6 executors 
    running on 3 gce nodesSpark 2.1.0

Elasticsearch specifications:

   8 cpu * 30 gb RAM single node

Versions:

   Elasticsearch: 6.2.2
   ES-Hadoop: 6.2.2

Even after this 15 mins period, the ingestion rate is quite slow. It took around 45 mins (in total) to write only 961 rows from Spark to ES.

For your information, Spark reads data from Cassandra DB, process the results (but this process is quite fast, takes around 1 - 2 mins) and then writes to Elasticsearch.

Any help would be greatly appreciated

Best,
Waleed

james.baiera · March 22, 2018, 8:32pm

You could try taking a look at the network response times by using tools like tcpdump. This will give a better idea of where the hangup is occurring, either on the Hadoop end, the Elasticsearch end, or the network in between them.

waleed_ali · April 10, 2018, 6:50am

I asked the same question at stackoverflow too, and there one person suggested me to change the Public IP (of ELK instance) to Private IP while ingesting data from Spark to ES.

This solved the issue of initial connection and slow writing by reducing the overall time period of ingestion from around 15-20 mins to only 12-15 seconds!

Hope, this may save other people's time as well.

system · May 8, 2018, 6:51am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Slow Performance of Elastic Search with Spark Elasticsearch es-hadoop	4	1557	July 29, 2021
Spark write parquet record to elasticsearch too slowly Elasticsearch es-hadoop	4	1890	July 6, 2017
Performance Challenge Elasticsearch es-hadoop	6	1082	April 28, 2017
Throttle the ES-Hadoop write speed Elasticsearch es-hadoop	3	631	September 29, 2020
Tunning ElasticSearch with Spark Elasticsearch	1	383	July 5, 2017

Spark Connector performance issue [thread contd]

Related topics