Issue with Loading Data from Elasticsearch into Databricks

Hello,

I'm encountering an issue while trying to load data from Elasticsearch into Databricks. Below is the code I'm using and the error message I'm receiving.

Code:

es_read_conf = {
    "es.nodes": "your-cluster-url",
    "es.port": "443",
    "es.net.http.auth.header": "Authorization: ApiKey <your-api-key>",
    "es.resource": "b-s-data",
    "es.net.ssl": "true",
    "es.nodes.wan.only": "true"
}

df = spark.read.format("org.elasticsearch.spark.sql").options(**es_read_conf).load("beat-starnet-data")

display(df)

Error:

kotlin

Py4JJavaError: An error occurred while calling o492.load.
: org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Cannot detect ES version - typically this happens if the network/Elasticsearch cluster is not accessible or when targeting a WAN/Cloud instance without the proper setting 'es.nodes.wan.only'
    at org.elasticsearch.hadoop.rest.InitializationUtils.discoverClusterInfo(InitializationUtils.java:403)
    at org.elasticsearch.spark.sql.ElasticsearchRelation.cfg$lzycompute(DefaultSource.scala:234)
    ...
Caused by: org.elasticsearch.hadoop.rest.EsHadoopInvalidRequest: org.elasticsearch.hadoop.rest.EsHadoopRemoteException: security_exception: missing authentication credentials for REST request [/]

Details:

  • The Elasticsearch cluster is accessible, as verified by basic connectivity tests.
  • The cluster status is yellow, but there are no signs of connectivity issues.
  • I am using the correct version of the Elasticsearch-Hadoop connector for Elasticsearch 8.6.2.

Questions:

  1. Main Error: The primary error is security_exception: missing authentication credentials for REST request [/. What could be causing this authentication issue?
  2. Configuration: Are there additional settings or adjustments needed for proper integration with a cloud-based Elasticsearch cluster?

Any guidance or suggestions on resolving this issue would be greatly appreciated.

Thank you!

From Elastic Search to Elasticsearch

Hi @Hernando_Segovia,

Welcome! I'll be honest, I'm not too familiar with Databricks or Apache Spark. But looking at the error it looks to not be accessible from your code:

org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Cannot detect ES version - typically this happens if the network/Elasticsearch cluster is not accessible or when targeting a WAN/Cloud instance without the proper setting 'es.nodes.wan.only'

From the details you've provided can you:

  1. Diagnose the reason for the yellow cluster status as per the tips in the documentation. Specifically if you can share the output of the cluster health status that would be useful.
  2. Can you check the Elasticsearch URL, port and API key values in your code are correct, and that the key has sufficient permissions to read from the index you want to read from?
  3. Are you using a WAN/Cloud instance as mentioned in the warning? Just checking since you have set es.nodes.wan.only to true in your code.

Let us know!

Hi @Hernando_Segovia. I don't believe that es-hadoop supports es.net.http.auth.header (although I might be wrong). Is there anything in the executor logs for your spark job related to authentication?