How to config elasticsearch nodes remote in pyspark? Node [] failed (Connection refused (Connection refused)); no other nodes left - aborting

I want to connect Spark to Elasticsearch. I use command:

pyspark --driver-class-path /home/bigdata/elasticsearch-hadoop-5.6.5/elasticsearch-hadoop-5.6.5/dist/elasticsearch-spark-20_2.10-5.6.5.jar --conf --conf

But it don't connect:

Py4JJavaError: An error occurred while calling
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 5.0 failed 1 times, most recent failure: Lost task 0.0 in stage 5.0 (TID 5, localhost, executor driver): Connection error (check network and/or proxy settings)- all nodes failed; tried [[]]
ERROR NetworkClient

Node [] failed (Connection refused (Connection refused)); no other nodes left - aborting...

(But I want to connect to, not to localhost:9200).

My code (in Python Jupyter Notebook):

PATH_TO_DATA = "../elasticsearch-spark-recommender/data/ml-latest-small"
ratings = + "/ratings.csv", header=True, inferSchema=True)
print("Number of ratings: %i" % ratings.count())
print("Sample of ratings:")


Can you help me?
Thank you very much.

I am not too familiar with how Pyspark captures settings as opposed to how the standard Spark connector captures them. Have you tried the settings without the spark. prefix?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.