Reading from Elasticsearch to Spark is very slow

Rami_Batal · July 1, 2019, 12:01pm

Hi, I am using Spark 2.4.0, Elasticsearch 6.6.2, and elasticsearch-spark-20_2.11-6.8.1.jar as connector.

I am running Spark local mode and configured the memory to 8GB.
I have an Elasticsearch index with 14 million documents.

I want to load the whole index to a Spark DataFrame, so I am doing:

import org.apache.spark.sql.SQLContext        
import org.elasticsearch.spark.sql._

val sql = new SQLContext(sc)
val myDF = sql.esDF("my-index/my-type").cache()

println(myDF.count())

I can see the memory being filled bit by bit, which is expected because of the cache(), the memory is large enough to hold the entire data, but the process is extremely slow (over 2 hours).

Any hint is highly appreciated.

Thanks

system · July 29, 2019, 12:04pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Reading elasticsearch data using spark SQL is too slow Elasticsearch es-hadoop	1	697	April 3, 2020
Reading from Elasticsearch index using spark ( es-hadoop ) connectors Elasticsearch es-hadoop	2	1405	March 22, 2022
Slow Performance of Elastic Search with Spark Elasticsearch es-hadoop	4	1535	July 29, 2021
Reading es by spark SQL is too slow Elasticsearch	1	381	March 18, 2020
Spark Reading From Elastic gives count less then actual Elasticsearch es-hadoop	1	678	June 7, 2017

Reading from Elasticsearch to Spark is very slow

Related topics