Executer/worker closing connection before trying to write, causing a SSLHandshakeException


#1

Hello,

around 1/3 of our files fail to write a DataFrame into Elasticsearch from our Spark-Application while using the ElasticSearch-Spark-Connector due to the following exception:
org.elasticsearch.hadoop.rest.EsHadoopTransportException: javax.net.ssl.SSLHandshakeException: Remote host closed connection during handshake

Certificates seem to be correct, else it would fail 100% of the time..?!

We use Elasticsearch 6.1.3 and the "Elasticsearch Spark (for Spark 2.0) » 6.1.3"-Connector

The full Exception:
¡error while spark execution=Job aborted due to stage failure: Task 0 in stage 3.0 failed 4 times, most recent failure: Lost task 0.3 in stage 3.0 (TID 7, xxx.tb.de, executor 1): org.elasticsearch.hadoop.rest.EsHadoopTransportException: javax.net.ssl.SSLHandshakeException: Remote host closed connection during handshake
at org.elasticsearch.hadoop.rest.NetworkClient.execute(NetworkClient.java:124)
at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:466)
at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:430)
at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:434)
at org.elasticsearch.hadoop.rest.RestClient.get(RestClient.java:155)
at org.elasticsearch.hadoop.rest.RestClient.getHttpNodes(RestClient.java:112)
at org.elasticsearch.hadoop.rest.RestClient.getHttpDataNodes(RestClient.java:129)
at org.elasticsearch.hadoop.rest.InitializationUtils.filterNonDataNodesIfNeeded(InitializationUtils.java:157)
at org.elasticsearch.hadoop.rest.RestService.createWriter(RestService.java:581)
at org.elasticsearch.spark.rdd.EsRDDWriter.write(EsRDDWriter.scala:58)
at org.elasticsearch.spark.sql.EsSparkSQL$$anonfun$saveToEs$1.apply(EsSparkSQL.scala:101)
at org.elasticsearch.spark.sql.EsSparkSQL$$anonfun$saveToEs$1.apply(EsSparkSQL.scala:101)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:108)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: javax.net.ssl.SSLHandshakeException: Remote host closed connection during handshake
at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:1002)
at sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1385)
at sun.security.ssl.SSLSocketImpl.writeRecord(SSLSocketImpl.java:757)
at sun.security.ssl.AppOutputStream.write(AppOutputStream.java:123)
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
at org.apache.commons.httpclient.HttpConnection.flushRequestOutputStream(HttpConnection.java:828)
at org.apache.commons.httpclient.HttpMethodBase.writeRequest(HttpMethodBase.java:2116)
at org.apache.commons.httpclient.HttpMethodBase.execute(HttpMethodBase.java:1096)
at org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:398)
at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171)
at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397)
at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323)
at org.elasticsearch.hadoop.rest.commonshttp.CommonsHttpTransport.execute(CommonsHttpTransport.java:478)
at org.elasticsearch.hadoop.rest.NetworkClient.execute(NetworkClient.java:112)
... 17 more
Caused by: java.io.EOFException: SSL peer shut down incorrectly
at sun.security.ssl.InputRecord.read(InputRecord.java:505)
at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:983)
... 31 more

787a89a49cfc¡cause=org.elasticsearch.hadoop.rest.EsHadoopTransportException: javax.net.ssl.SSLHandshakeException: Remote host closed connection during handshake


Executer logs:

Executor task launch worker for task 3, READ: TLSv1.2 Handshake, length = 333
Executor task launch worker for task 3, READ: TLSv1.2 Handshake, length = 4
Executor task launch worker for task 3, WRITE: TLSv1.2 Handshake, length = 70
Executor task launch worker for task 3, WRITE: TLSv1.2 Change Cipher Spec, length = 1
Executor task launch worker for task 3, WRITE: TLSv1.2 Handshake, length = 64
Executor task launch worker for task 3, READ: TLSv1.2 Change Cipher Spec, length = 1
Executor task launch worker for task 3, READ: TLSv1.2 Handshake, length = 64
%% Cached client session: [Session-1, TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA]
Executor task launch worker for task 3, WRITE: TLSv1.2 Application Data, length = 256
Executor task launch worker for task 3, READ: TLSv1.2 Application Data, length = 2640
Executor task launch worker for task 3, called close()
Executor task launch worker for task 3, called closeInternal(true)
Executor task launch worker for task 3, SEND TLSv1.2 ALERT: warning, description = close_notify
Executor task launch worker for task 3, WRITE: TLSv1.2 Alert, length = 48
Executor task launch worker for task 3, called closeSocket(true)
Executor task launch worker for task 3, called close()
Executor task launch worker for task 3, called closeInternal(true)
Executor task launch worker for task 3, called close()
Executor task launch worker for task 3, called closeInternal(true)
Executor task launch worker for task 3, WRITE: TLSv1.2 Handshake, length = 209
Executor task launch worker for task 3, received EOFException: error
Executor task launch worker for task 3, handling exception: javax.net.ssl.SSLHandshakeException: Remote host closed connection during handshake
Executor task launch worker for task 3, SEND TLSv1.2 ALERT: fatal, description = handshake_failure
Executor task launch worker for task 3, WRITE: TLSv1.2 Alert, length = 2
Executor task launch worker for task 3, Exception sending alert: java.net.SocketException: Broken pipe (Write failed)
Executor task launch worker for task 3, called closeSocket()
Executor task launch worker for task 3, called close()
Executor task launch worker for task 3, called closeInternal(true)
Executor task launch worker for task 3, called close()
Executor task launch worker for task 3, called closeInternal(true)
Executor task launch worker for task 3, called close()
Executor task launch worker for task 3, called closeInternal(true)

The "worker for task 3" is closing the connection before trying to WRITE/handshake(?) again, which causes an exception. Sometimes it works, sometimes it doesnt

We already tried setting all environments to TSL1, TSL1.1 and are currently using TSL1.2

Anyone any ideas? Thanks in advance!


#2

Hello,

wie fixed the problem:
The IP-Whitelisting between Hadoop-Workernodes/ES-Masternode/ES-Datanode wasnt properly configured. Our Hadoop-Workernodes couldnt connect with the ES-Masternode.

Thanks


(system) #3

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.