How to config elasticsearch nodes remote in pyspark? Node [] failed (Connection refused (Connection refused)); no other nodes left - aborting

(Luan Ha Thanh) #1

I want to connect Spark to Elasticsearch. I use command:

pyspark --driver-class-path /home/bigdata/elasticsearch-hadoop-5.6.5/elasticsearch-hadoop-5.6.5/dist/elasticsearch-spark-20_2.10-5.6.5.jar --conf --conf

But it don't connect:

Py4JJavaError: An error occurred while calling
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 5.0 failed 1 times, most recent failure: Lost task 0.0 in stage 5.0 (TID 5, localhost, executor driver): Connection error (check network and/or proxy settings)- all nodes failed; tried [[]]
ERROR NetworkClient

Node [] failed (Connection refused (Connection refused)); no other nodes left - aborting...

(But I want to connect to, not to localhost:9200).

My code (in Python Jupyter Notebook):

PATH_TO_DATA = "../elasticsearch-spark-recommender/data/ml-latest-small"
ratings = + "/ratings.csv", header=True, inferSchema=True)
print("Number of ratings: %i" % ratings.count())
print("Sample of ratings:")


Can you help me?
Thank you very much.

(James Baiera) #2

I am not too familiar with how Pyspark captures settings as opposed to how the standard Spark connector captures them. Have you tried the settings without the spark. prefix?

(system) #3

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.