EsHadoopIllegalArgumentException: Cannot detect ES version

I'm getting error
"EsHadoopIllegalArgumentException: Cannot detect ES version-typically this happens if the network/Elasticsearch cluster is not accessible or when targeting a WAN/Cloud instance without the proper setting 'es.nodes.wan.only'*

I'm using pyspark to write my dataframe to elasticsearch cluster like this:

df1.write.format("org.elasticsearch.spark.sql")\
    .option("es.nodes", host)\
    .option("es.port", port)\
    .option("es.net.http.auth.user", username)\
    .option("es.net.http.auth.pass", password)\
    .option("es.resource", indexName)\
    .option("es.net.ssl.keystore.location", pathToCAFile))\
    .mode('overwrite')\
    .save()

I've tried wan only options but they make no difference.

I've checked my cluster connectivity using curl, it's working totally fine and I'm able to connect to elasticsearch server using python also but Pyspark is giving me hardtime here. I'm using the same version of EsHadoop-jar file also as per my elasticsearch cluster version.

Please help me with this issue.

Hi @Piyush_Jain. Do all nodes in your spark cluster have access to all of your Elasticsearch nodes? You could get more information about the problem if you set logging for org.elasticsearch.hadoop.rest to TRACE. The way to do that depends on your spark distribution, but it's probably a file called conf/log4j.properties in your spark installation. After you do that, you ought to see a lot more logging in your pyspark logs.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.