"Could not get a Transport from the Transport Pool for host" error slows down spark job to dump data into ES clusters

yuecong · May 19, 2020, 4:28am

I have one spark job to dump data into the ES cluster via wan mode, and we observed that when the doc is with some big size. like 150kb, it will have a lot of Could not get a Transport from the Transport Pool for host error inside the spark job side, but we can not see any errors inside the ES cluster. All the thread_pool rejection is zero or very low, also we can see the tasks are spread across all ES data nodes.

here is the code snippet on the configures from the spark side.

    val query = filteredlog.writeStream
      .outputMode("append")
      .format("org.elasticsearch.spark.sql")
      .option("es.nodes.wan.only", "true")
      .option("es.port", "9200")
      .option("es.net.http.auth.user", elasticsearchUsername)
      .option("es.net.http.auth.pass", elasticsearchPassword)
      .option("checkpointLocation", streamCheckpointFolder)
      .option("es.net.ssl", "true")
      .option("es.net.ssl.cert.allow.self.signed", "true")
      .option("es.mapping.date.rich", "true")
      .option("es.nodes", elasticsearchHostName)
      .option("es.resource.write", "my-log-{dateHourUtc}")
      .option("es.http.timeout", "10s")
      .start()

for every mini batch, it has 96 partitions with the same amount of data to write to the ES cluster via wan mode. For some of the task( sometimes,4, and sometimes, it even goes up to 30 ) failed and this slow down the mini batch and eventually cause the stream job get falling behind to write data from stream to ES clusters.

Can I get some suggestions on what does "Could not get a Transport from the Transport Pool for host" mean for the spark side? Why I get this error and it looks like my ES cluster is not overloaded at all? What parameter can I tune to fix this?

Thanks

system · June 16, 2020, 4:40am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Elasticsearch ports using ES-Hadoop Elasticsearch es-hadoop	3	1769	July 6, 2017
Elasticsearch Spark EsHadoopNoNodesLeftException in cluster Mode Elasticsearch	7	7473	July 5, 2017
Spark EsRDDWriter intermittent failure Elasticsearch es-hadoop	2	1236	November 1, 2017
Error job spark streaming elasticsearch Elasticsearch es-hadoop	2	1505	February 21, 2018
Spark - querying ElasticSearch cluster over a RDD Elasticsearch es-hadoop	5	2293	July 6, 2017

"Could not get a Transport from the Transport Pool for host" error slows down spark job to dump data into ES clusters

Related topics