I want to connect Spark to Elasticsearch. I use command:
pyspark --driver-class-path /home/bigdata/elasticsearch-hadoop-5.6.5/elasticsearch-hadoop-5.6.5/dist/elasticsearch-spark-20_2.10-5.6.5.jar --conf spark.es.nodes=107.111.111.111 --conf spark.es.port=9200
But it don't connect:
Py4JJavaError: An error occurred while calling o43.save.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 5.0 failed 1 times, most recent failure: Lost task 0.0 in stage 5.0 (TID 5, localhost, executor driver): org.elasticsearch.hadoop.rest.EsHadoopNoNodesLeftException: Connection error (check network and/or proxy settings)- all nodes failed; tried [[127.0.0.1:9200]]
ERROR NetworkClient
Node [127.0.0.1:9200] failed (Connection refused (Connection refused)); no other nodes left - aborting...
(But I want to connect to 107.111.111.111:9200, not to localhost:9200).
My code (in Python Jupyter Notebook):
PATH_TO_DATA = "../elasticsearch-spark-recommender/data/ml-latest-small"
ratings = spark.read.csv(PATH_TO_DATA + "/ratings.csv", header=True, inferSchema=True)
ratings.cache()
print("Number of ratings: %i" % ratings.count())
print("Sample of ratings:")
ratings.show(5)ratings.write.format("es").save("demo/ratings")
Can you help me?
Thank you very much.