Pyspark: from curl to correct settings

obi134 · September 23, 2022, 7:58am

Hi there,

I'm trying to push data from databricks/pyspark to elasticsearch following these instructions: ElasticSearch | Databricks on AWS

Unfortunately I'm getting this error:

org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Cannot detect ES version - typically this happens if the network/Elasticsearch cluster is not accessible or when targeting a WAN/Cloud instance without the proper setting 'es.nodes.wan.only'

Running a curl command from databricks is working and I see the pushed data in kibana. So connection is working in general. But how do I get from curl command to correct settings in python? I already tried to find the right options on Configuration | Elasticsearch for Apache Hadoop [8.11] | Elastic, but currently not successful.

This is the curl command:

curl --user username:passwd -X PUT -H "Content-Type: application/json" -d '{"name":"John Doe"}' http://my.url.com/elasticsearch/test_databricks/_doc/1

Pythoncode I tried:

df.write
  .format( "org.elasticsearch.spark.sql" )
  .option( "es.nodes",   "my.url.com/elasticsearch/")
  .option( "es.net.ssl", "false")
  .option( "es.nodes.wan.only", "true" )
  .option( "es.net.http.auth.user", "username")
  .option( "es.net.http.auth.pass", "passwd")
  .mode( "overwrite" )
  .save( "index/test_databricks" )

Thank you in advance

Keith_Massey · September 26, 2022, 4:47pm

You can get that error for a variety of reasons. Look for a caused by stack trace that might give you more information. I believe it willl be in your spark driver log.

obi134 · September 29, 2022, 12:58pm

Using the option es.nodes.path.prefix fixed the issue:

df.write
  .format( "org.elasticsearch.spark.sql" )
  .option( "es.nodes",   "my.url.com")
  .option( "es.nodes.path.prefix", "elasticsearch" ) 
  .option( "es.net.ssl", "false")
  .option( "es.nodes.wan.only", "true" )
  .option( "es.net.http.auth.user", "username")
  .option( "es.net.http.auth.pass", "passwd")
  .mode( "overwrite" )
  .save( "/test_databricks" )

system · October 27, 2022, 12:58pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Problems connecting to ES from Databricks using spark connector Elasticsearch es-hadoop	3	696	June 5, 2023
EsHadoopIllegalArgumentException: Cannot detect ES version Elasticsearch es-hadoop	2	576	October 6, 2023
Elastic - Spark connector failing to read data Elasticsearch es-hadoop	8	1114	June 29, 2023
PySpark writing to ES: "Cannot detect ES version" Elasticsearch es-hadoop	10	161	July 24, 2024
Connection Spark and ElasticSearch Elasticsearch es-hadoop	3	3281	August 27, 2017

Pyspark: from curl to correct settings

Related topics