How to config elasticsearch nodes remote in pyspark? Node [127.0.0.1:9200] failed (Connection refused (Connection refused)); no other nodes left - aborting


(Luan Ha Thanh) #1

I want to connect Spark to Elasticsearch. I use command:

pyspark --driver-class-path /home/bigdata/elasticsearch-hadoop-5.6.5/elasticsearch-hadoop-5.6.5/dist/elasticsearch-spark-20_2.10-5.6.5.jar --conf spark.es.nodes=107.111.111.111 --conf spark.es.port=9200

But it don't connect:

Py4JJavaError: An error occurred while calling o43.save.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 5.0 failed 1 times, most recent failure: Lost task 0.0 in stage 5.0 (TID 5, localhost, executor driver): org.elasticsearch.hadoop.rest.EsHadoopNoNodesLeftException: Connection error (check network and/or proxy settings)- all nodes failed; tried [[127.0.0.1:9200]]
ERROR NetworkClient

Node [127.0.0.1:9200] failed (Connection refused (Connection refused)); no other nodes left - aborting...

(But I want to connect to 107.111.111.111:9200, not to localhost:9200).

My code (in Python Jupyter Notebook):

PATH_TO_DATA = "../elasticsearch-spark-recommender/data/ml-latest-small"
ratings = spark.read.csv(PATH_TO_DATA + "/ratings.csv", header=True, inferSchema=True)
ratings.cache()
print("Number of ratings: %i" % ratings.count())
print("Sample of ratings:")
ratings.show(5)

ratings.write.format("es").save("demo/ratings")

Can you help me?
Thank you very much.


(James Baiera) #2

I am not too familiar with how Pyspark captures settings as opposed to how the standard Spark connector captures them. Have you tried the settings without the spark. prefix?


(system) #3

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.