I'm trying to export data from Spark => elasticsearch cluster running on Google Container Engine (GKE). I've deployed an ES cluster using configs from https://github.com/pires/kubernetes-elasticsearch-cluster/tree/master/stateful that create a couple of each node type: master, client, data.
I'm able to insert data into ES through the Spark connector if I have it connect to one of the Client nodes and set
es.nodes.client.only=true
After reading
https://www.elastic.co/guide/en/elasticsearch/hadoop/master/configuration.html#_network and
https://www.elastic.co/guide/en/elasticsearch/hadoop/master/cloud.html
though, I'd like to have Spark write directly to the data nodes. However, if I switch back to the default es.nodes.client.only=false
I get this error:
at java.lang.Thread.run(Thread.java:748)org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: No data nodes with HTTP-enabled available
at org.elasticsearch.hadoop.rest.InitializationUtils.filterNonDataNodesIfNeeded(InitializationUtils.java:157)
at org.elasticsearch.hadoop.rest.RestService.createWriter(RestService.java:576)
at org.elasticsearch.spark.rdd.EsRDDWriter.write(EsRDDWriter.scala:58)
at org.elasticsearch.spark.sql.EsSparkSQL$$anonfun$saveToEs$1.apply(EsSparkSQL.scala:91)
at org.elasticsearch.spark.sql.EsSparkSQL$$anonfun$saveToEs$1.apply(EsSparkSQL.scala:91)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
at org.apache.spark.scheduler.Task.run(Task.scala:86)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
Error summary: EsHadoopIllegalArgumentException: No data nodes with HTTP-enabled available