I had some problems using the Elasticsearch connector for Spark described here: https://www.elastic.co/guide/en/elasticsearch/hadoop/master/spark.html. I could not even get the examples on their page working with a plain vanilla instance of Elasticsearch 7.4.0 that I downloaded and started via
<downloadDir>/bin/elasticsearch
Here is what I did to run. I started Spark via the command:
spark-shell --packages "org.elasticsearch:elasticsearch-hadoop:7.4.0"
Then I typed in the lines from the code given on the documentation page referenced above:
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.elasticsearch.spark._
val numbers = Map("one" -> 1, "two" -> 2, "three" -> 3)
val airports = Map("arrival" -> "Otopeni", "SFO" -> "San Fran")
spark.sparkContext.makeRDD( Seq(numbers, airports)).saveToEs("spark/docs")
I got some strange errors indicating ES was connecting to something other than the default master node [127.0.0.1:9200], and then failing even with that node:
[Stage 0:> (0 + 12) / 12]20/10/13 19:39:21 ERROR NetworkClient: Node [172.20.0.3:9200] failed (org.apache.commons.httpclient.ConnectTimeoutException: The host did not accept the connection within timeout of 60000 ms); selected next node [127.0.0.1:9200]
20/10/13 19:39:21 ERROR NetworkClient: Node [172.20.0.3:9200] failed (org.apache.commons.httpclient.ConnectTimeoutException: The host did not accept the connection within timeout of 60000 ms); selected next node [127.0.0.1:9200]
Note that if I type http://127.0.0.1:9200/ in my browser URL bar I get back a JSON doc indicating the cluster is up on localhost:9200. So, I'm stumped! Any guidance much appreciated.