Slow Performance of Elastic Search with Spark

Hardik_Aggarwal · June 25, 2021, 7:40pm

I am trying to read the data from Elastic Search into a dataframe using Java ES-Spark-connector.. Now when I try to execute a query for example count() on the dataframe, the performance is dismal. For a 3Gb data it takes around 6 mins. On the other hand if I save the data in hadoop/hdfs it and then read it from there it takes around 3s. Can some one tell me a work around for this. The code I am using is as below.

SparkConf conf = new SparkConf().setAppName("Simple App").setMaster("local[*]");
        conf.set("es.index.auto.create", "true");
        JavaSparkContext javaSparkContext = new JavaSparkContext(conf);
       
        SparkSession spark = new SparkSession(javaSparkContext.sc());
        SQLContext sqlContext = new SQLContext(spark);
        Dataset<Row> ds = spark.read().format("org.elasticsearch.spark.sql").load("index_name");
       count = ds.count();   // this takes around 6 mins for 3GB data

priamai · June 29, 2021, 12:01am

I am also having performance problems, the same Spark query I do from the raw JSON files is 12-15 times faster than the same query via ES.
I have about this is on the github issues but they told me to ask in here, anyway we need to find a way to inspect the push down query to understand what's going on.

priamai · July 1, 2021, 5:32am

And even if you do this:

ds.count().explain(extended=True)

does not show you the ES queries.

I have another discussion open here for monitoring queries in the backend which seems the only solution for now.

priamai · July 1, 2021, 2:48pm

Please read this too because what is essentially happening is that the ES driver is not that smart and does just a full scroll of the data instead of leveraging the ES capabilities.

system · July 29, 2021, 2:49pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Performance Challenge Elasticsearch es-hadoop	6	1081	April 28, 2017
Elastic Search - JavaEsSpark read is slow Elasticsearch es-hadoop	8	1571	July 6, 2017
Spark ES connector is taking time Elasticsearch	1	362	June 21, 2018
Reading elasticsearch data using spark SQL is too slow Elasticsearch es-hadoop	1	697	April 3, 2020
Elasticsearch + Spark read performance issues Elasticsearch es-hadoop	3	2274	May 24, 2016

Slow Performance of Elastic Search with Spark

Related topics