Getting a "No data nodes with HTTP-enabled available" error when writing from Spark to elasticsearch on Google Dataproc

bw2 · August 10, 2017, 2:53pm

I'm trying to export data from Spark => elasticsearch cluster running on Google Container Engine (GKE). I've deployed an ES cluster using configs from https://github.com/pires/kubernetes-elasticsearch-cluster/tree/master/stateful that create a couple of each node type: master, client, data.

I'm able to insert data into ES through the Spark connector if I have it connect to one of the Client nodes and set

es.nodes.client.only=true

After reading
https://www.elastic.co/guide/en/elasticsearch/hadoop/master/configuration.html#_network and
https://www.elastic.co/guide/en/elasticsearch/hadoop/master/cloud.html
though, I'd like to have Spark write directly to the data nodes. However, if I switch back to the default es.nodes.client.only=false

I get this error:

	at java.lang.Thread.run(Thread.java:748)org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: No data nodes with HTTP-enabled available
	at org.elasticsearch.hadoop.rest.InitializationUtils.filterNonDataNodesIfNeeded(InitializationUtils.java:157)
	at org.elasticsearch.hadoop.rest.RestService.createWriter(RestService.java:576)
	at org.elasticsearch.spark.rdd.EsRDDWriter.write(EsRDDWriter.scala:58)
	at org.elasticsearch.spark.sql.EsSparkSQL$$anonfun$saveToEs$1.apply(EsSparkSQL.scala:91)
	at org.elasticsearch.spark.sql.EsSparkSQL$$anonfun$saveToEs$1.apply(EsSparkSQL.scala:91)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
	at org.apache.spark.scheduler.Task.run(Task.scala:86)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:748)

Error summary: EsHadoopIllegalArgumentException: No data nodes with HTTP-enabled available

james.baiera · August 11, 2017, 5:32pm

I'm not too well versed in Docker, but taking a look through the configurations for the linked deployment I found these lines which are probably responsible for the lack of HTTP availability:

I also only see network configurations for transport level traffic (port 9300, non-http). I don't think this configuration is meant to be used in the manner that you are describing.

bw2 · August 11, 2017, 8:55pm

Thanks, toggling that to "true" does fix it.

bw2 · August 11, 2017, 8:57pm

I'm not sure how much of a difference this makes, but isn't it suboptimal that the Elasticsearch Spark connector communicates with elasticsearch using HTTP rather than the TCP protocol on 9300?

james.baiera · August 15, 2017, 5:14pm

TCP protocol is not backwards compatible between versions of Elasticsearch the same way that HTTP is. There have also been a fair amount of benchmarks performed on HTTP vs RPC and they have found that the two have comparable performance characteristics.

bw2 · August 16, 2017, 2:03am

Got it. Thanks.

tempogrup · September 8, 2017, 12:30pm

Thank you for sharing!

ankara temizlik şirketleri

system · October 6, 2017, 12:30pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
ES Spark API writing to a Cluster failes when connecting to CLient Node Elasticsearch es-hadoop	4	1090	July 6, 2017
Client-only routing specified but no client nodes with HTTP-enabled available Elasticsearch	2	1537	July 6, 2017
Writing to Elasticsearch from Spark failing Elasticsearch es-hadoop	1	489	July 21, 2020
Error writing to Elastic search from Databricks Elasticsearch es-hadoop	6	291	March 6, 2024
Client-only routing specified but no client nodes with HTTP-enabled were found in the cluster Elasticsearch es-hadoop	4	1019	March 15, 2019

Getting a "No data nodes with HTTP-enabled available" error when writing from Spark to elasticsearch on Google Dataproc

Related topics