Correct settings for "es.nodes.wan.only"

darthapple · August 23, 2016, 8:56pm

Hello,

I am trying to run a spark job to load data from emr to ES cluster hosted by elastic.co. [ cluster ID "665e60" ]

Following is code snippet i am using to test this.

import java.io.PrintStream
import org.apache.spark.SparkContext    
import org.apache.spark.SparkContext._

import org.elasticsearch.spark._

val conf = new SparkConf()
  conf.set("spark.es.nodes","ssssss.us-east-1.aws.found.io")
  conf.set("spark.es.port","9243")
  conf.set("spark.es.nodes.discovery","ture")
  conf.set("spark.es.nodes.client.only","false")
  conf.set("spark.es.nodes.wan.only","false")`indent preformatted text by 4 spaces`
  conf.set("spark.es.net.http.auth.user","sssss")
  conf.set("spark.es.net.http.auth.pass","lololol")
 val sc = new SparkContext(conf)    
//  print(conf.toDebugString)
val numbers = Map("one" -> 1, "two" -> 2, "three" -> 3)
val airports = Map("arrival" -> "Otopeni", "SFO" -> "San Fran")

sc.makeRDD(Seq(numbers, airports)).saveToEs("spark/docs")

Which results in the following error.
> org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Cannot detect ES version - typically this happens if the network/Elasticsearch cluster is not accessible or when targeting a WAN/Cloud instance without the proper setting 'es.nodes.wan.only'
> at org.elasticsearch.hadoop.rest.InitializationUtils.discoverEsVersion(InitializationUtils.java:196)
> at org.elasticsearch.hadoop.rest.RestService.createWriter(RestService.java:379)
> at org.elasticsearch.spark.rdd.EsRDDWriter.write(EsRDDWriter.scala:40)
> at org.elasticsearch.spark.rdd.EsSpark$$anonfun$doSaveToEs$1.apply(EsSpark.scala:84)
> at org.elasticsearch.spark.rdd.EsSpark$$anonfun$doSaveToEs$1.apply(EsSpark.scala:84)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
> at org.apache.spark.scheduler.Task.run(Task.scala:89)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)

Can someone please suggest what config am i missing here ?

james.baiera · August 24, 2016, 3:51pm

Your Spark driver is able to connect to the ES Cloud instance, but it seems that the Executors do not have the capability to establish a connection. This is normally the case when running in cloud environments, as most cloud environments keep all of their executors/task runners inside of a separate secured network. You will need to make sure that the executors/task runners are able to connect to the provided ES node by configuring the network settings of your deployment.

darthapple · October 12, 2016, 10:41pm

Thanks James,
I was able figure this out.

animageofmine · January 31, 2017, 4:29am

@darthapple Can you share what was the issue and how did you fix?

bobby259 · May 10, 2017, 5:32pm

@darthapple what was the resolution for this?
@animageofmine were you able to resolve this? I am facing the same issue. Please help.

jobyjohny · June 29, 2017, 8:56pm

I am facing issue now. Could you please let me know the fix details

Mayank_Vijay · July 6, 2017, 7:39am

Hello,
I just want to view data stored in elastic search in form of hive table. The hive query i run is:
CREATE EXTERNAL TABLE testHiveELKTable (account int, quantity int) STORED BY 'org.elasticsearch.hasoop.hive.EsStoragwHandler' TBLPROPERTIES('es.resource' = 'index/type');

But i get,
Failed:Execution Error, return code 1 fromorg.apache.hadoop.hive.ql.exec.DDLTask. org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Cannot detect ES version- this typically happens if network/elasticsearch cluster is not accessible or when targetting a WAN/Cloud instance without proper setting in 'es.nodes.wan.only'

I even tried setting es.nodes.wan.only = true but still error shows.

donghe90 · November 16, 2017, 10:35pm

This issue was a headache for me as well. I bypass this by adding executors' ipaddress to AWS elasticsearch access policy. Hope this help.

Mohammed_sheik · October 24, 2018, 12:14pm

James

we are running locally the spark as standalone master so when spark driver able to connect what could be other reasons?

nara · January 9, 2019, 5:05pm

Hi @darthapple,

Could you please post the solution. As , I am experiencing the same issue. I have hosted both EMR and ES on AWS. Thanks!

Dima_Dermanskyi · September 19, 2019, 10:03am

For me even when I had es.nodes.wan.only = true it still complained that it org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Cannot detect ES version

I use ES provided by AWS Elasticsearch service. In order to overcome the problem I also had to add the following config property: es.net.ssl = true.

Phan_Trung · September 25, 2019, 3:55am

just import right version with your elasticsearch cluster version, it will work well all the time.

james.baiera · October 16, 2019, 8:29pm

Not being able to connect to Elasticsearch is a common problem and we appreciate when people post solutions that got them out of that problem. That said, I am going to close this thread since it is quite old and is often resurrected while a new thread may have been better to post questions and solutions in. Thank you all for your participation!

Topic		Replies	Views
Cannot detect ES version - typically this happens when accessing a WAN/Cloud instance without the proper setting 'es.nodes.wan.only' on Docker Deployment Elasticsearch es-hadoop	5	10996	July 6, 2017
Spark elasticcloud connection issue Elasticsearch	1	496	November 21, 2018
While executing the spark Job in cluster mode Cannot detect ES version hosted by AWS Elasticsearch es-hadoop	2	894	February 14, 2019
Error on indexing remote es cluster using spark on ES 5 alpha3 Elasticsearch es-hadoop	4	1669	July 6, 2017
Spark.es.nodes config setting in Spark not getting picked up Elasticsearch es-hadoop	2	2772	July 6, 2017

Correct settings for "es.nodes.wan.only"

Related topics