Basic Authentication with Spark fails with 403(forbidden)

I am trying to write to ES with Basic auth. It fails with this error (see below). I tried it with straight java and other tools with setting the basic auth headers and they all work fine. Here is the code

val sparkConf = new SparkConf().setAppName("ElasticSearchTest").setMaster("local[*]")

sparkConf.set("es.nodes", "<elastic cluster host>")    
sparkConf.set("es.port", "9200")
sparkConf.set("es.net.http.auth.user", "Jack")
sparkConf.set("es.net.http.auth.pass", "1111111111111111")

val sc = new SparkContext(sparkConf)  

val json1 = """{"title" : "test"}"""      

sc.makeRDD(Seq(json1)).saveJsonToEs("a/b")  
Error:
org.elasticsearch.hadoop.rest.EsHadoopInvalidRequest: [HEAD] on [a] failed; server[xxx.xxx.xxx.xxx:9202] returned [403|Forbidden:]
	at org.elasticsearch.hadoop.rest.RestClient.checkResponse(RestClient.java:368)
	at org.elasticsearch.hadoop.rest.RestClient.executeNotFoundAllowed(RestClient.java:344)
	at org.elasticsearch.hadoop.rest.RestClient.exists(RestClient.java:409)
	at org.elasticsearch.hadoop.rest.RestClient.touch(RestClient.java:415)
	at org.elasticsearch.hadoop.rest.RestRepository.touch(RestRepository.java:443)
	at org.elasticsearch.hadoop.rest.RestService.initSingleIndex(RestService.java:408)
	at org.elasticsearch.hadoop.rest.RestService.createWriter(RestService.java:396)
	at org.elasticsearch.spark.rdd.EsRDDWriter.write(EsRDDWriter.scala:40)
	at org.elasticsearch.spark.rdd.EsSpark$$anonfun$saveToEs$1.apply(EsSpark.scala:67)
	at org.elasticsearch.spark.rdd.EsSpark$$anonfun$saveToEs$1.apply(EsSpark.scala:67)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
	at org.apache.spark.scheduler.Task.run(Task.scala:88)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
	at java.lang.Thread.run(Unknown Source)
1 Like

btw, I am using 2.1.2 version on elasticsearch-spark

compile 'org.elasticsearch:elasticsearch-spark_2.10:2.1.2'

1 Like

If the page on security does not help, to diagnose the issue is best to turn on logging in particular on the REST and Spark packages.
This provides info on the network traffic so one can see where the connection is being made and what information is passed through.