Read from ES index fails without write permission with HEAD [405|Method Not Allowed:]

We're trying to connect to 2 different elastic instances and read data from databricks on indicies where a user has read permissions. For both instances we're seeing the following error

EsHadoopInvalidRequest: [HEAD] on [index_name] failed; server[servername] returned [405|Method Not Allowed:]

From the Databricks terminal, I've also attempted to curl and am getting the same response

curl -u "username:password" --head https://server/es/indexname/_search
HTTP/1.1 405 Method Not Allowed
Date: Mon, 20 Mar 2023 12:23:12 GMT
Server: Apache/2/4/54 (Win64) OpenSSL/3.0.5
Strict-Transport-Security: max-age=63-72---;includeSubDomains
X-elastic-product: Elasticsearch
content-type: application/json; charset=UTF-8
content-length: 126

The user has read permissions on the index and has [cluster:monitor/main] permissions

What other permissions are necessary and how does an account need to be configured to allow read-only access to pull data from an index?

I don't believe there's any way to do a head request on <index>/_search. Get and post are the only methods allowed. You can see it in the code at elasticsearch/ at main · elastic/elasticsearch · GitHub. Is there something built into es-hadoop/spark that is doing that head request? If so that would be a bug.

Keith - thanks for taking a look at the code to check and that makes sense and seems pretty clear actually from the curl response I posted that only GET, POST methods are allowed. I don't know what's doing the head request as part of the es-hadoop call to load data. I realize I didn't post that in this thread, I posted a similar discussion yesterday about the prefix and port options for es-hadoop. Here's the call that's providing the EsHadoopInvalidRequest response

val df ="org.elasticsearch.spark.sql")
  .option("es.nodes", "")
  .option("es.port", "443")
  .option("", "username")
  .option("", "password")


I'm able to run the same code against another ES instance I have access to and that works with no issue. The only difference is for that instance I have more permissions. I'm not exactly sure what permissions but two of the clusters I'm working with are run by the same admin team and so they know that for one cluster it works fine where I have basically full control, for the other two where it's failing they're only able to give me read access.

It looks like it's giving us the full request path in the error message, and it doesn't include _search, right? elasticsearch-hadoop/ at main · elastic/elasticsearch-hadoop · GitHub
There are a handful of places in es-spark that do a HEAD request to the index to check if it exists. I haven't figured out why that would ever return a 405 though. HEAD requests are allowed on the index (elasticsearch/ at main · elastic/elasticsearch · GitHub). And if you don't have permission to see an index I would expect to get a 404 rather than a 405. What do you get if you manually do curl -u "username:password" --head https://server/es/indexname?

Keith - thanks for helping on this.

When I run that curl I get the same response as with _search, looks like

curl -u "username:password" --head https://server/es/indexname
HTTP/1.1 405 Method Not Allowed
Date: Thu, 23 Mar 2023 05:28:05 GMT
Server: Apache/2.4.54 (Win64) OpenSSL/3.0.5
Strict-Transport-Security: max-age=63072000;includeSubDomains
Allow: POST
X-elastic-product: Elasticsearch
content-type: application/json; charset=UTF-8
content-length: 113

Only difference really compared to calling against the _search endpoint is this claims it only allows POST
This is on ES version 6.17.3, WinServer2012R2 running as a service with Apache Procrun

Oh I get the same thing if I try to do a HEAD on a URL like that. The reason is that Elasticsearch is seeing the path you're asking for as es/indexname, and interpreting that as /{index}/{type}. So it's hitting this: elasticsearch/ at main · elastic/elasticsearch · GitHub, which only allows for POST.

Gotcha, thanks for the feedback. I guess that sort of gets at my other support ticket around port and prefix. But in this case we're just doing a curl...

Any work around you can think of or is this a bug that should be raised as an issue on GH?

I'm not very familiar with es.nodes.path.prefix, but it sounds like there might be two problems, right?
(1) Somehow es-hadoop is swapping the prefix and port as you describe in Prefix and port appear flipped in es-hadoop implementation - #3 by petersedivec. It would be good to get a ticket for that (with steps to reproduce)
(2) It looks like maybe your elasticsearch is not configured right, since it is failing on curl -u "username:password" --head https://server/es/indexname. I would assume that es/ would have been stripped out of the path before Elasticsearch got to it.

Hi Keith,

Yes, that's correct as far as I see it. Appreciate the help

Hey Keith, sorry for not getting back to you on this sooner. Yep, agree with you on the two potential issues as you've written them. Any suggestion on how to configure elasticsearch correctly so that the "es/" prefix is stripped out properly?

Sorry I didn't get back to you last week. Was on vacation with our family and would typically have responded but our car broke down on the 23rd, first day of the trip and from there that just overwhelmed me for a few days and I didn't get back to emails.


I might be wrong, but I don't think that Elasticsearch itself has any kind of support for running with a prefix. I think the assumption is that you might have elasticsearch-hadoop hitting a proxy, and that proxy strips out the prefix and sends an ordinary request to an Elasticsearch cluster that doesn't know anything about the prefix. So the configuration is all in your proxy/load balancer, not in Elasticsearch.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.