PySpark writing to ES: "Cannot detect ES version"

I'm using the following PySpark code to write a DataFrame to an Elasticsearch cluster hosted in Elastic Cloud.

df.writeStream.outputMode("append")
        .format("org.elasticsearch.spark.sql")
        .option("checkpointLocation", "s3a://example/abc")
        .option("es.nodes.wan.only", "true")
        .option("es.nodes", "https://example.es.us-west-1.aws.found.io")
        .option("es.port", 443)
        .option("es.net.http.auth.user", user)
        .option("es.net.http.auth.pass", "************")
        .option("es.resource", index_name)
        .option("es.mapping.id", id_column)
        .option("es.write.operation", "upsert")
        .start()

But I get the following error:

: org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Cannot detect ES version - typically this happens if the network/Elasticsearch cluster is not accessible or when targeting a WAN/Cloud instance without the proper setting 'es.nodes.wan.only'

This seems to be a common error posted in these forums, and the typical solutions are:

  • Set "es.nodes.wan.only" to "true"
  • Ensure "es.nodes" does not point to a Cloud ID but to an https endpoint
  • Ensure "es.net.http.auth.user" and "es.net.http.auth.pass" values are correctly set

I've done all of those things, and in the past they were sufficient. In fact, my exact same code and configuration is able to write to an Elasticsearch cluster running version 8.11.3. The new cluster that I need to write to has version 8.13.2, and that's when I get the error, so I suspect something changed between those two versions.

I've tried using several elasticsearch-spark jar versions:

  • org.elasticsearch:elasticsearch-spark-30_2.12:8.14.3
  • org.elasticsearch:elasticsearch-spark-30_2.12:8.13.2

But none change the error.

Is this cluster also on Elastic Cloud?

I'm not using Elastic Cloud at the moment, but if I'm not wrong the port used is 9243 not 443.

1 Like

Hi @rzotti-sd cc @leandrojmp

Actually :443 works for all endpoints now, for elasticsearch :9243 is "legacy" but still supported BUT you DO need to add a port otherwise most client libraries will default to the actual default of :9200 so when I see this

.option("es.nodes", "https://example.es.us-west-1.aws.found.io")

it is probably defaulting to :9200 which will not work so put in the correct port and it should work

.option("es.nodes", "https://example.es.us-west-1.aws.found.io:443")
and give it a try...

1 Like

Both versions are running in Elastic Cloud.

I tried .option("es.nodes", "https://example.es.us-west-1.aws.found.io:443") and got the same error.

Is there any way I can turn on verbose logging? Or debug locally without using Spark but using the jar directly, just to test that the connection works?

Try a curl with the same url an user and password

curl -u user https://example.es.us-west-1.aws.found.io:443

Ohh and @rzotti-sd welcome to the community

Connecting using your provided curl command works. I'm also able to connect via Python's elasticsearch library too.

I am not that familiar with that client...

can you try the actual endpoint

something like

https://sdfgsdfgf52872baf38dfb21236.us-west-1.aws.found.io:443

Test with the curl first notice no .es.

It could be a bug.... but I would think that would have been caught in the automated tests...

Can you show more of the stack trace ... anything else interesting?

There ought to be more in the stack trace. Es-hadoop gives that Cannot detect ES version... error message for any exception it catches while trying to connect to the cluster. Sometimes it can be misleading. But the "caused by" portion of the stack trace ought to tell us more.

Added es-hadoop and removed #elastic-cloud

That was it!

Switching to the actual, garbled endpoint, like https://sdfgsdfgf52872baf38dfb21236.us-west-1.aws.found.io:443, did the trick.

When I ran your original curl command, I actually didn't get back anything. I also didn't get an error, so I figured everything was fine and not worth mentioning. But when I use the actual endpoint in the curl command, I get back something like the following:

{
  "name" : "instance-....",
  "cluster_name" : "...",
  "cluster_uuid" : "...",
  "version" : {
    "number" : "8.13.2",
    "build_flavor" : "default",
    "build_type" : "docker",
    "build_hash" : "...",
    "build_date" : "2024-04-05T14:45:26.420424304Z",
    "build_snapshot" : false,
    "lucene_version" : "9.10.0",
    "minimum_wire_compatibility_version" : "7.17.0",
    "minimum_index_compatibility_version" : "7.0.0"
  },
  "tagline" : "You Know, for Search"
}

Passing the https://example.es.us-west-1.aws.found.io endpoint to curl when calling the 8.11.3 cluster gives a similar payload to when I use the actual endpoint with the 8.13.2 cluster. Updating my Spark code to point to the real endpoint works too, and I'm able to see data flow through and get added to the index.

How did you know to use the real endpoint and not the cleaner looking endpoint? Is that documented somewhere?

By the way, thank you so much for your fast responses and help. I spent 4 days working on the issue. I'm so relieved it's resolved.

1 Like

That means the URL was not or is not correct. ..

Whatever works with the curl should probably work like that library