Elasticsearch-hadoop spark connector unable to connect/write using out-of-box ES server setup, & default library settings

I had some problems using the Elasticsearch connector for Spark described here: https://www.elastic.co/guide/en/elasticsearch/hadoop/master/spark.html. I could not even get the examples on their page working with a plain vanilla instance of Elasticsearch 7.4.0 that I downloaded and started via

<downloadDir>/bin/elasticsearch 

Here is what I did to run. I started Spark via the command:

spark-shell --packages "org.elasticsearch:elasticsearch-hadoop:7.4.0"

Then I typed in the lines from the code given on the documentation page referenced above:

import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.elasticsearch.spark._

val numbers = Map("one" -> 1, "two" -> 2, "three" -> 3)
val airports = Map("arrival" -> "Otopeni", "SFO" -> "San Fran")

spark.sparkContext.makeRDD( Seq(numbers, airports)).saveToEs("spark/docs")

I got some strange errors indicating ES was connecting to something other than the default master node [127.0.0.1:9200], and then failing even with that node:

[Stage 0:>                                                        (0 + 12) / 12]20/10/13 19:39:21 ERROR NetworkClient: Node [172.20.0.3:9200] failed (org.apache.commons.httpclient.ConnectTimeoutException: The host did not accept the connection within timeout of 60000 ms); selected next node [127.0.0.1:9200]
20/10/13 19:39:21 ERROR NetworkClient: Node [172.20.0.3:9200] failed (org.apache.commons.httpclient.ConnectTimeoutException: The host did not accept the connection within timeout of 60000 ms); selected next node [127.0.0.1:9200]

Note that if I type http://127.0.0.1:9200/ in my browser URL bar I get back a JSON doc indicating the cluster is up on localhost:9200. So, I'm stumped! Any guidance much appreciated.

At the beginning of that message there is a mention of 172.20.0.3:9200 - where might that be coming from, as I assume that Elasticsearch is not listening on that IP by default.

Hi.. re: that extra IP address. I am not sure. I did trace through the code a bit, and I found that IP add was being added somewhere here-> https://github.com/chenrun0210/elasticsearch-hadoop-7.0/blob/e7c263d8b2d65e4fa0023e1f1cbe762536819f4d/mr/src/main/java/org/elasticsearch/hadoop/rest/NetworkClient.java#L67. - I gave up on tracing through the problem since this is such a simple use case. I thought someone would see whatever dumb thing I did that is preventing this from working (hopefully).

So, the problem was I had another instance of elasticsearch in another window listening on the same port. That always hoses things in strange ways. So.. this adapter has no problem at all. Problem was me.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.