I am trying to read data from Elasticsearch to Databricks (Spark) but I'm getting the following error:
org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Cannot detect ES version - typically this happens if the network/Elasticsearch cluster is not accessible or when targeting a WAN/Cloud instance without the proper setting 'es.nodes.wan.only'
which is symptomatic of a wrong driver version according to your documentation
I'm running
Databricks runtime version 13.0 (includes Apache Spark 3.4.0, Scala 2.12)
*Elasticsearch version 8.5.2
I thus installed org.elasticsearch:elasticsearch-spark-30_2.12:8.5.2 from Maven on the Databricks cluster
From a networking perspective, I’m able to telnet elastic.
However, I’m not able to pull data from Elastic server using the following command
Here is what I get from the curl command (masked IP address and hostname).
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0* Trying YYY.YY.YYY.YY:9200...
SSL certificate problem: unable to get local issuer certificate
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
Closing connection 0
curl: (60) SSL certificate problem: unable to get local issuer certificate
curl failed to verify the legitimacy of the server and therefore could not
establish a secure connection to it. To learn more about this situation and
how to fix it, please visit the web page mentioned above.
I believe this is a network connectivity issue. Curl works but not the spark.read command. Any idea where this can come from? Thanks
This is my latest trace below:
Connecting ElasticSeach from Databricks notebook. Cluster runs on single node mode.
at org.elasticsearch.hadoop.rest.NetworkClient.execute(NetworkClient.java:160)
at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:442)
at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:438)
at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:406)
at org.elasticsearch.hadoop.rest.RestClient.mainInfo(RestClient.java:755)
at org.elasticsearch.hadoop.rest.InitializationUtils.discoverClusterInfo(InitializationUtils.java:393)
at org.elasticsearch.spark.sql.ElasticsearchRelation.cfg$lzycompute(DefaultSource.scala:234)
at org.elasticsearch.spark.sql.ElasticsearchRelation.cfg(DefaultSource.scala:231)
at org.elasticsearch.spark.sql.ElasticsearchRelation.lazySchema$lzycompute(DefaultSource.scala:238)
at org.elasticsearch.spark.sql.ElasticsearchRelation.lazySchema(DefaultSource.scala:238)
at org.elasticsearch.spark.sql.ElasticsearchRelation.$anonfun$schema$1(DefaultSource.scala:242)
at scala.Option.getOrElse(Option.scala:189)
at org.elasticsearch.spark.sql.ElasticsearchRelation.schema(DefaultSource.scala:242)
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.