org.elasticsearch.hadoop.rest.EsHadoopNoNodesLeftException: Connection error (check network and/or proxy settings)- all nodes


(Carlos Palomares) #1

We have a problem with the library elastic for hadoop in sparkSQL with scala, Everything is working perfectly when we save a RDD in a specific index into of the cluster:

3 nodes
with 16GB of ran everyone
configuration in elasticsearch.yml by default without special parameters
HDD no SSD
200GB of HDD in every node
sharding 3 in all index

We have 3 index:
index-1 with 400.000 docs
index-2 with 9.400.000 docs
index-3 with 29.400.000 docs

When We execute our jobs inserting in index-1 and index-2 everything is fine. But when We execute insert in the index-3 with 29.400.000 docs, we insert 100.000 docs ans suddenly:
18/06/12 13:10:57 ERROR NetworkClient: Node [10.133.1.239:9200] failed (Read timed out); selected next node [10.133.1.248:9200]
18/06/12 13:10:57 ERROR NetworkClient: Node [10.133.1.239:9200] failed (Read timed out); selected next node [10.133.1.249:9200]
18/06/12 13:10:57 ERROR NetworkClient: Node [10.133.1.239:9200] failed (Read timed out); selected next node [10.133.1.248:9200]
18/06/12 13:11:57 ERROR NetworkClient: Node [10.133.1.248:9200] failed (Read timed out); selected next node [10.133.1.249:9200]
18/06/12 13:11:57 ERROR NetworkClient: Node [10.133.1.248:9200] failed (Read timed out); selected next node [10.133.1.249:9200]
18/06/12 13:11:58 ERROR NetworkClient: Node [10.133.1.249:9200] failed (Read timed out); selected next node [10.133.1.248:9200]
18/06/12 13:12:58 ERROR NetworkClient: Node [10.133.1.249:9200] failed (Read timed out); no other nodes left - aborting...
18/06/12 13:12:58 ERROR NetworkClient: Node [10.133.1.249:9200] failed (Read timed out); no other nodes left - aborting...
18/06/12 13:12:58 ERROR Executor: Exception in task 4.0 in stage 7.0 (TID 106)
org.elasticsearch.hadoop.rest.EsHadoopNoNodesLeftException: Connection error (check network and/or proxy settings)- all nodes failed; tried [[10.133.1.239:9200, 10.133.1.248:9200, 10.133.1.249:9200]]
at org.elasticsearch.hadoop.rest.NetworkClient.execute(NetworkClient.java:149)
at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:380)
at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:364)
at org.elasticsearch.hadoop.rest.RestClient.bulk(RestClient.java:216)
at org.elasticsearch.hadoop.rest.bulk.BulkProcessor.tryFlush(BulkProcess

Obviously the task dead and the job too, but the nodes in elastic are live!!!!.......... If we try the same insert with and other index new, everything is fine. I dont understand why??!!!!!

Parameters in elastic for hadoop:
"elastic.nodes.value" -> [10.133.1.239, 10.133.1.248, 10.133.1.249]
"elastic.port.value" -> 9200
"elastic.mappingid.value" -> uuid

Maybe we need some parameters extra? We are not in AWS we have 3 machines in house. start insert, insert 100.000 abd closed connections.......... but only with index with a lot documents.

Any help?


(James Baiera) #2

The 100.000 document count makes sense, as this is the default number of documents that the connector batches up before sending them to the Elasticsearch cluster as a bulk request. In this case, its entirely possible that your nodes are overloaded trying to insert the data for whatever reason. I'm seeing the timeouts are at 1 minute before throwing the errors. Perhaps increasing es.http.timeout to a value higher than 1m might allow the jobs to complete in the mean time. Note that increasing that timeout value may lead to slower job times.

In terms of investigating write speed issues, upgrading the HDD to SSD is generally a good step. Increasing the time between index refreshes might also help.


(system) #3

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.