How to config elasticsearch nodes remote in pyspark? Node [127.0.0.1:9200] failed (Connection refused (Connection refused)); no other nodes left - aborting

I want to connect Spark to Elasticsearch. I use command:

pyspark --driver-class-path /home/bigdata/elasticsearch-hadoop-5.6.5/elasticsearch-hadoop-5.6.5/dist/elasticsearch-spark-20_2.10-5.6.5.jar --conf spark.es.nodes=107.111.111.111 --conf spark.es.port=9200

But it don't connect:

Py4JJavaError: An error occurred while calling o43.save.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 5.0 failed 1 times, most recent failure: Lost task 0.0 in stage 5.0 (TID 5, localhost, executor driver): org.elasticsearch.hadoop.rest.EsHadoopNoNodesLeftException: Connection error (check network and/or proxy settings)- all nodes failed; tried [[127.0.0.1:9200]]
ERROR NetworkClient

Node [127.0.0.1:9200] failed (Connection refused (Connection refused)); no other nodes left - aborting...

(But I want to connect to 107.111.111.111:9200, not to localhost:9200).

My code (in Python Jupyter Notebook):

PATH_TO_DATA = "../elasticsearch-spark-recommender/data/ml-latest-small"
ratings = spark.read.csv(PATH_TO_DATA + "/ratings.csv", header=True, inferSchema=True)
ratings.cache()
print("Number of ratings: %i" % ratings.count())
print("Sample of ratings:")
ratings.show(5)

ratings.write.format("es").save("demo/ratings")

Can you help me?
Thank you very much.

I am not too familiar with how Pyspark captures settings as opposed to how the standard Spark connector captures them. Have you tried the settings without the spark. prefix?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.