Spark Connector performance issues - with start up time

tranan · October 4, 2017, 12:18am

I am currently experiencing performance issues where I notice that the Elasticsearch Spark connector is stuck 10's of minutes warming / starting up without any tasks / jobs allocated. See screen shot. Any idea on what is causing the 10's of minutes of warm up / startup time?

I have leverage both PySpark AND Spark Shell (Scala) and noticed the same issue

Here 's how I perform the query:

argusDF = sqlContext.read.format('org.elasticsearch.spark.sql')
.option('es.resource', 'argus*')
.option('es.nodes', esCluster)
.option('es.port', '9200')
.option('pushdown', 'true')
.option('es.scroll.keepalive', '2m')
.option('es.http.timeout', '2m')
.option('es.input.max.docs.per.partition', '10000')
.option('es.scroll.size', '5000')
.option('double.filtering', 'true')
.option('es.index.read.missing.as.empty', 'true')
.option('es.read.field.empty.as.null', 'true').load()
argusTable = 'argus'
argusDF.registerTempTable(argusTable)

sqlContext.sql('select ' + fieldsToQueryCSV + ' from ' + argusTable + ' where ' +
'src_ip= "'+ ipAddress + '" OR ' +
'dest_ip= "'+ ipAddress +'"')

james.baiera · October 4, 2017, 3:50am

@tranan - that's really peculiar. Are you able to get on to the Spark executor and collect a thread dump from the process using something like jstack and see what the process is stuck doing?

tranan · October 4, 2017, 10:01am

Attached is a screenshot from the executor tab. Note that there are a number of executors and cores that are live but no tasks.

Secondly I have clicked the thread dump link on one of the live executor and see the thread dump screenshot below. This is similar for all the live executors

Please note this is running on YARN. I have also reproduce this issue on another cluster running Spark Standalone pointing to the same Elasticsearch cluster. Also i want to note that the the regular REST and Java API works perfectly.

james.baiera · October 4, 2017, 1:17pm

It seems the cluster is totally idle, waiting for work to be assigned. Would you be able to pull a thread dump from your Spark driver to see what it might be stuck doing that is keeping it from assigning work?

tranan · October 4, 2017, 6:47pm

Below is the thread dump of the driver... I suspect the cluster is trying to get a file handle on all the shards that are in the cluster and blocking until it can. There's a total of ~3300 shards in the cluster ~ 30TB of data. I also noticed if i change the http timeout duration say to < 1M i get timeout exception.

james.baiera · October 5, 2017, 5:17pm

I'm not sure I see any ES-Hadoop related classes in those thread dumps. ES-Hadoop does create at least one Spark partition for each shard in the indices that it wants to read from. It's possible that Spark is having a hard time handling that many partitions during its startup phase.

system · November 2, 2017, 5:18pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Spark Connector performance issue [thread contd] Elasticsearch es-hadoop	3	965	May 8, 2018
Spark ES connector is taking time Elasticsearch	1	362	June 21, 2018
Slow Performance of Elastic Search with Spark Elasticsearch es-hadoop	4	1532	July 29, 2021
How to parallelize ES load operation in Spark using the connector lib? Elasticsearch es-hadoop	5	1376	May 6, 2019
Stress testing ES-Hadoop Elasticsearch es-hadoop	7	1657	July 6, 2017

Spark Connector performance issues - with start up time

Related topics