Elastic - Spark connector failing to read data

Hi all,

I am trying to read data from Elasticsearch to Databricks (Spark) but I'm getting the following error:

org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Cannot detect ES version - typically this happens if the network/Elasticsearch cluster is not accessible or when targeting a WAN/Cloud instance without the proper setting 'es.nodes.wan.only'

which is symptomatic of a wrong driver version according to your documentation

I'm running

  • Databricks runtime version 13.0 (includes Apache Spark 3.4.0, Scala 2.12)
    *Elasticsearch version 8.5.2
  • I thus installed org.elasticsearch:elasticsearch-spark-30_2.12:8.5.2 from Maven on the Databricks cluster

From a networking perspective, I’m able to telnet elastic.
However, I’m not able to pull data from Elastic server using the following command

df = (spark.read
      .format( "org.elasticsearch.spark.sql" )
      .option( "spark.es.nodes",   hostname )
      .option( "spark.es.port",    port     )
      .option( "spark.es.nodes.wan.only", "true" )
      .option("spark.es.net.ssl", "true")
      .option("spark.es.net.http.auth.user", username) 
      .option("spark.es.net.http.auth.pass", password)  
      .load( f"{index}" )
     )
display(df)

Hello,

I have the same issue. Were you able to resolve this and connect to ES?

Hi Si,
No not yet. Anyone to help us from Elastic community?

Hi @ljSolaiman

I know nothing about that connector but a common issue is if you are using self signed search.

Instead telnet from the client server can you try this and show the results

curl -v -u elastic:password https://hostname:port