"Could not get a Transport from the Transport Pool for host" error slows down spark job to dump data into ES clusters

I have one spark job to dump data into the ES cluster via wan mode, and we observed that when the doc is with some big size. like 150kb, it will have a lot of Could not get a Transport from the Transport Pool for host error inside the spark job side, but we can not see any errors inside the ES cluster. All the thread_pool rejection is zero or very low, also we can see the tasks are spread across all ES data nodes.

here is the code snippet on the configures from the spark side.

    val query = filteredlog.writeStream
      .outputMode("append")
      .format("org.elasticsearch.spark.sql")
      .option("es.nodes.wan.only", "true")
      .option("es.port", "9200")
      .option("es.net.http.auth.user", elasticsearchUsername)
      .option("es.net.http.auth.pass", elasticsearchPassword)
      .option("checkpointLocation", streamCheckpointFolder)
      .option("es.net.ssl", "true")
      .option("es.net.ssl.cert.allow.self.signed", "true")
      .option("es.mapping.date.rich", "true")
      .option("es.nodes", elasticsearchHostName)
      .option("es.resource.write", "my-log-{dateHourUtc}")
      .option("es.http.timeout", "10s")
      .start()

for every mini batch, it has 96 partitions with the same amount of data to write to the ES cluster via wan mode. For some of the task( sometimes,4, and sometimes, it even goes up to 30 ) failed and this slow down the mini batch and eventually cause the stream job get falling behind to write data from stream to ES clusters.

Can I get some suggestions on what does "Could not get a Transport from the Transport Pool for host" mean for the spark side? Why I get this error and it looks like my ES cluster is not overloaded at all? What parameter can I tune to fix this?

Thanks

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.