Spark job is failing with authenticating with BASIC error

diplomaticguru · July 16, 2015, 4:04pm

I have Elasticsearch cluster that has Basic HTTP authentication enabled. So, in my spark configuration I set the following parameters as described in the documentation:

"es.net.http.auth.user"
"es.net.http.auth.pass"

However, when I executed the spark job in my yarn-cluster, I'm getting this error:

httpclient.HttpMethodDirector: Failure authenticating with BASIC 'Elasticsearch cluster read/write'@dev.ce.com:9200
15/07/16 17:22:23 ERROR executor.Executor: Exception in task 0.0 in stage 1.0 (TID 25)
org.elasticsearch.hadoop.rest.EsHadoopInvalidRequest: [GET] on [_nodes/transport] failed; server[null] returned [401|Unauthorized:]
    at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:336)
    at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:301)
    at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:305)
    at org.elasticsearch.hadoop.rest.RestClient.get(RestClient.java:119)
    at org.elasticsearch.hadoop.rest.RestClient.discoverNodes(RestClient.java:101)
    at org.elasticsearch.hadoop.rest.InitializationUtils.discoverNodesIfNeeded(InitializationUtils.java:58)
    at org.elasticsearch.hadoop.rest.RestService.createWriter(RestService.java:372)
    at org.elasticsearch.spark.rdd.EsRDDWriter.write(EsRDDWriter.scala:40)
    at org.elasticsearch.spark.rdd.EsSpark$$anonfun$saveToEsWithMeta$1.apply(EsSpark.scala:86)
    at org.elasticsearch.spark.rdd.EsSpark$$anonfun$saveToEsWithMeta$1.apply(EsSpark.scala:86)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
    at org.apache.spark.scheduler.Task.run(Task.scala:56)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)

But when I use the curl command to check the index with the user/pass, it works fine:
curl -u testes:test123 -XGET dev.ce.com:9200/test_index_es?pretty

Please let me know what am I doing incorrectly?

diplomaticguru · July 16, 2015, 5:58pm

Okay, so I did my own investigation and I found out what the problem is but still need your help to resolve the issue.

When I checked the ES_Hadoop source-code, I found the error is being thrown when discoverNodes() method is called from RestClient class. This method is trying to GET nodes details by calling this endpoint "_nodes/transport". However, the problem is that user testes:test123 does not have admin privilege. Therefore, it is throwing that error.

I tried to get the "_nodes/transport details using curl and it failed with below error as expected:

 curl -u testes:test123 -XGET dev.ce.com:9200/_nodes/transport
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>401 Authorization Required</title>
</head><body>
<h1>Authorization Required</h1>
<p>This server could not verify that you
are authorized to access the document
requested.  Either you supplied the wrong
credentials (e.g., bad password), or your
browser doesn't understand how to supply
the credentials required.</p>
</body></html>

The user account that I'm using has only access to a specific index (we don't want them to access everything), so it will not be able to access "_nodes/transport". Not sure what I could do other than granting admin privilege to the account, which I don't want to. Any suggestions?

costin · July 16, 2015, 8:57pm

What system are you using for securing the cluster? The connector needs information about the index topology in order to access the nodes/shards directly - without getting access to the nodes, it cannot do any discovery (even when using the client-only option).

diplomaticguru · July 17, 2015, 10:55am

@costin, It's our own custom solution using an apache proxy/redirector in order to have per index authentication. Initially, we access the search node dev.ce.com, apache will first authenticate and then forward the request to ES. With this authentication in place I don't think the current es-hadoop lib will work as expected, unless it is customised!

diplomaticguru · July 17, 2015, 12:28pm

@costin, don't worry about this issue, we've granted our user to access _node.

Many thanks.

costin · July 17, 2015, 7:59pm

Glad to hear things were sorted out.

Topic		Replies	Views
Basic Authentication with Spark fails with 403(forbidden) Elasticsearch es-hadoop	3	2674	July 6, 2017
ElascticSearch and Spark fails with 403(forbidden) Elasticsearch es-hadoop	6	3923	July 6, 2017
Apache Spark to query Elasticsearch (https and basic authentication) Elasticsearch es-hadoop	3	4064	October 23, 2019
RoR/es-hadoop connector/Hive problem Elasticsearch es-hadoop	12	1882	April 20, 2018
NoClassDefFoundError: Could not initialize class org.elasticsearch.common.network.NetworkService Elasticsearch es-hadoop	2	1460	October 10, 2017

Spark job is failing with authenticating with BASIC error

Related topics