Error when multiple calls to writeToES

pchanas · May 31, 2017, 12:28pm

Hi
Using : elastic 5.4.0, spark-2.1.1-bin-hadoop2.7 and elasticsearch-hadoop-5.4.0 on MacOS, I need to write 3 Dataset in my ES.
I use the writeToES method, 3 times... If I write only 1 of the 3 Datasets, everything if fine. As far as I try to write 2 datasets, I get the following error (note that the first dataset is well wriitten in es).
I'm working locally on my laptop, same code and config work fine on others environments (linux).
If so has an idea where the issue is, it will great
Thanks in advance
Pascal

2017-05-31 13:59:54 ERROR NetworkClient - Node [127.0.0.1:9200] failed (Connection reset); no other nodes left - aborting...

2017-05-31 13:59:54 ERROR Executor - Exception in task 20.0 in stage 3.0 (TID 9)
org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Cannot detect ES version - typically this happens if the network/Elasticsearch cluster is not accessible or when targeting a WAN/Cloud instance without the proper setting 'es.nodes.wan.only'
at org.elasticsearch.hadoop.rest.InitializationUtils.discoverEsVersion(InitializationUtils.java:283)
at org.elasticsearch.hadoop.rest.RestService.createWriter(RestService.java:572)
at org.elasticsearch.spark.rdd.EsRDDWriter.write(EsRDDWriter.scala:58)
at org.elasticsearch.spark.sql.EsSparkSQL$$anonfun$saveToEs$1.apply(EsSparkSQL.scala:91)
at org.elasticsearch.spark.sql.EsSparkSQL$$anonfun$saveToEs$1.apply(EsSparkSQL.scala:91)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:99)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.elasticsearch.hadoop.rest.EsHadoopNoNodesLeftException: Connection error (check network and/or proxy settings)- all nodes failed; tried [[127.0.0.1:9200]]
at org.elasticsearch.hadoop.rest.NetworkClient.execute(NetworkClient.java:150)
at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:461)
at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:425)
at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:429)
at org.elasticsearch.hadoop.rest.RestClient.get(RestClient.java:155)
at org.elasticsearch.hadoop.rest.RestClient.remoteEsVersion(RestClient.java:627)
at org.elasticsearch.hadoop.rest.InitializationUtils.discoverEsVersion(InitializationUtils.java:276)
... 10 more
2017-05-31 13:59:54 ERROR TaskSetManager - Task 20 in stage 3.0 failed 1 times; aborting job
Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 20 in stage 3.0 failed 1 times, most recent failure: Lost task 20.0 in stage 3.0 (TID 9, localhost, executor driver): org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Cannot detect ES version - typically this happens if the network/Elasticsearch cluster is not accessible or when targeting a WAN/Cloud instance without the proper setting 'es.nodes.wan.only'
at org.elasticsearch.hadoop.rest.InitializationUtils.discoverEsVersion(InitializationUtils.java:283)
at org.elasticsearch.hadoop.rest.RestService.createWriter(RestService.java:572)
at org.elasticsearch.spark.rdd.EsRDDWriter.write(EsRDDWriter.scala:58)
at org.elasticsearch.spark.sql.EsSparkSQL$$anonfun$saveToEs$1.apply(EsSparkSQL.scala:91)
at org.elasticsearch.spark.sql.EsSparkSQL$$anonfun$saveToEs$1.apply(EsSparkSQL.scala:91)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:99)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.elasticsearch.hadoop.rest.EsHadoopNoNodesLeftException: Connection error (check network and/or proxy settings)- all nodes failed; tried [[127.0.0.1:9200]]
at org.elasticsearch.hadoop.rest.NetworkClient.execute(NetworkClient.java:150)
at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:461)
at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:425)
at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:429)
at org.elasticsearch.hadoop.rest.RestClient.get(RestClient.java:155)
at org.elasticsearch.hadoop.rest.RestClient.remoteEsVersion(RestClient.java:627)
at org.elasticsearch.hadoop.rest.InitializationUtils.discoverEsVersion(InitializationUtils.java:276)
... 10 more

james.baiera · May 31, 2017, 3:49pm

Could you include the code that you are using to submit the jobs?

pchanas · June 1, 2017, 8:25am

Hi
A bit tricky, we have code in a lot a classes :-s
I try, hope it is enough.
Many thanks
Pascal

...
conf.set("es.nodes", "");
conf.set("es.port", esPort);
conf.set("es.index.auto.create", "true");
...
sparkContext = new JavaSparkContext(conf);
SparkSession sparkSession = sparkSQLContextHandler.getContext();

	Dataset<Row> callDs = CallService.javaRddToDataset(this.getCallJavaRdd(), Call.class);
	Dataset<Row> calleeDs = callDs.select("userid", "date_date_appel")
			.distinct()
			.withColumn("idEs", functions.concat_ws("_", callDs.col("date_date_appel"), callDs.col("userid")));

	CallService.writeToEs(callDs, callIndexAndType);
	CallService.writeToEs(calleeDs, calleeIndexAndType, "append", "idEs");

With

public void writeToEs(Dataset<Row> dataset, String esIndex) {
	String mode = "append";
	this.writeToEs(dataset, esIndex, mode);	
}

@Override
public void writeToEs(Dataset<Row> dataset, String esIndex, String mode, String id) {
	dataset.write()
	.format("org.elasticsearch.spark.sql")
	.option("es.index.auto.create", "true")
	.option("es.resource.auto.create",esIndex)
	.option("es.mapping.id", id)
	.mode(mode)
	.save(esIndex);
	
}

private void setCallJavaRdd() {
	@SuppressWarnings("unchecked")
	JavaRDD<Call> callRdd = (JavaRDD<Call>) CallService.getPojoRdd("call", "Call");
	this.callJavaRdd = callRdd;
	this.callJavaRdd.cache();
}

public Dataset<Row> javaRddToDataset(JavaRDD<?> dataJavaRdd, Class<?> clazz) {
	SparkSQLContextHandler sparkSQLContextHandler = (SparkSQLContextHandler) ApplicationFactory.getBean("sparkSQL");
	SparkSession sparkSession = sparkSQLContextHandler.getContext();
	Dataset<Row> dataset = sparkSession.createDataFrame(dataJavaRdd, clazz);
	return dataset;
}

james.baiera · June 1, 2017, 4:25pm

I'm not sure I see where you are setting the nodes for the job, Are they being configured in a different part of the project?

pchanas · June 2, 2017, 6:26am

Hi
I guess it is part of context initiation:

	SparkConf conf = new SparkConf().setAppName("Spark App").setMaster("local[*]");

	this.propertyHandler = (PropertyHandler) ApplicationFactory.getBean("property");
	String esHost = propertyHandler.getValue("es.host");
	String esPort = propertyHandler.getValue("es.port");

	// Ajout de la conf ES au contexte Spark
	conf.set("es.nodes", esHost);
	conf.set("es.port", esPort);
	conf.set("es.index.auto.create", "true");

	// Création du contexte Spark
	sparkContext = new JavaSparkContext(conf);

esHost is equal to localhost in the property file.

system · June 30, 2017, 6:26am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Unable to write data to elasticsearch 5.3 from Databricks/Spark Elasticsearch	3	1099	June 7, 2017
Writing from spark to elasticsearch fails Elasticsearch es-hadoop	2	1050	August 14, 2017
Error with pyspark connect es Elasticsearch es-hadoop	1	943	September 24, 2020
PySpark writing to ES: "Cannot detect ES version" Elasticsearch es-hadoop	10	690	July 24, 2024
Writing to Elasticsearch from Spark failing Elasticsearch es-hadoop	1	517	July 21, 2020

Error when multiple calls to writeToES

Related topics