Spark EsRDDWriter intermittent failure

Hi,
Seeing intermittent errors trying to write to Elasticsearch 5.3 based cluster using Spark EsRDDWriter (which internally uses elasticsearch-hadoop project 5.2.2)

Is there any way to configure client.transport.ping_timeout value through elasticsearch-hadoop project?
Cluster seems fine throughout the period. Seeing intermittent failures cause spark job to fail and adding time.

17/09/14 14:05:33 INFO TransportClientNodesService: failed to get node info for {#transport#-1}{APUY_l8CQEiTytE1emuB1A}{11.38.95.211}{11.38.95.211:9300}, disconnecting...
ReceiveTimeoutTransportException[[][11.38.95.211:9300][cluster:monitor/nodes/liveness] request_id [59] timed out after [5000ms]]
at org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:840)
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:444)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
17/09/14 14:06:28 WARN TransportPool: Could not validate pooled connection on lease. Releasing pooled connection and trying again...
java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
at java.net.SocketInputStream.read(SocketInputStream.java:171)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
at org.apache.commons.httpclient.HttpParser.readRawLine(HttpParser.java:78)
at org.apache.commons.httpclient.HttpParser.readLine(HttpParser.java:106)
at org.apache.commons.httpclient.HttpConnection.readLine(HttpConnection.java:1116)
at org.apache.commons.httpclient.HttpMethodBase.readStatusLine(HttpMethodBase.java:1973)
at org.apache.commons.httpclient.HttpMethodBase.readResponse(HttpMethodBase.java:1735)
at org.apache.commons.httpclient.HttpMethodBase.execute(HttpMethodBase.java:1098)
at org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:398)
at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171)
at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397)
at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323)
at org.elasticsearch.hadoop.rest.commonshttp.CommonsHttpTransport.execute(CommonsHttpTransport.java:471)
at org.elasticsearch.hadoop.rest.pooling.TransportPool.validate(TransportPool.java:98)
at org.elasticsearch.hadoop.rest.pooling.TransportPool.borrowTransport(TransportPool.java:127)
at org.elasticsearch.hadoop.rest.pooling.PooledHttpTransportFactory.borrowFrom(PooledHttpTransportFactory.java:104)
at org.elasticsearch.hadoop.rest.pooling.PooledHttpTransportFactory.create(PooledHttpTransportFactory.java:55)
at org.elasticsearch.hadoop.rest.NetworkClient.selectNextNode(NetworkClient.java:99)
at org.elasticsearch.hadoop.rest.NetworkClient.(NetworkClient.java:82)
at org.elasticsearch.hadoop.rest.NetworkClient.(NetworkClient.java:59)
at org.elasticsearch.hadoop.rest.RestClient.(RestClient.java:91)
at org.elasticsearch.hadoop.rest.InitializationUtils.discoverEsVersion(InitializationUtils.java:240)
at org.elasticsearch.hadoop.rest.RestService.createWriter(RestService.java:546)
at org.elasticsearch.spark.rdd.EsRDDWriter.write(EsRDDWriter.scala:58)
at org.elasticsearch.spark.rdd.EsSpark$$anonfun$doSaveToEs$1.apply(EsSpark.scala:102)
at org.elasticsearch.spark.rdd.EsSpark$$anonfun$doSaveToEs$1.apply(EsSpark.scala:102)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:99)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

@Anusha_Dharmalingam, I'm a little perplexed by these logs. I see a connection timeout from testing an HTTP connection from the ES-Hadoop library, but right before it there is a log line that seems to be related to the regular Elasticsearch transport client. Could you elaborate on what your configuration is and what you're trying to accomplish? Are you using both the Transport client AND ES-Hadoop in your job?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.