Elasticsearch-hadoop spark connector unable to connect/write using out-of-box ES server setup, & default library settings

Chris_Bedford · October 14, 2020, 3:14am

I had some problems using the Elasticsearch connector for Spark described here: https://www.elastic.co/guide/en/elasticsearch/hadoop/master/spark.html. I could not even get the examples on their page working with a plain vanilla instance of Elasticsearch 7.4.0 that I downloaded and started via

<downloadDir>/bin/elasticsearch

Here is what I did to run. I started Spark via the command:

spark-shell --packages "org.elasticsearch:elasticsearch-hadoop:7.4.0"

Then I typed in the lines from the code given on the documentation page referenced above:

import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.elasticsearch.spark._

val numbers = Map("one" -> 1, "two" -> 2, "three" -> 3)
val airports = Map("arrival" -> "Otopeni", "SFO" -> "San Fran")

spark.sparkContext.makeRDD( Seq(numbers, airports)).saveToEs("spark/docs")

I got some strange errors indicating ES was connecting to something other than the default master node [127.0.0.1:9200], and then failing even with that node:

[Stage 0:>                                                        (0 + 12) / 12]20/10/13 19:39:21 ERROR NetworkClient: Node [172.20.0.3:9200] failed (org.apache.commons.httpclient.ConnectTimeoutException: The host did not accept the connection within timeout of 60000 ms); selected next node [127.0.0.1:9200]
20/10/13 19:39:21 ERROR NetworkClient: Node [172.20.0.3:9200] failed (org.apache.commons.httpclient.ConnectTimeoutException: The host did not accept the connection within timeout of 60000 ms); selected next node [127.0.0.1:9200]

Note that if I type http://127.0.0.1:9200/ in my browser URL bar I get back a JSON doc indicating the cluster is up on localhost:9200. So, I'm stumped! Any guidance much appreciated.

spinscale · October 14, 2020, 8:01am

At the beginning of that message there is a mention of 172.20.0.3:9200 - where might that be coming from, as I assume that Elasticsearch is not listening on that IP by default.

Chris_Bedford · October 14, 2020, 4:13pm

Hi.. re: that extra IP address. I am not sure. I did trace through the code a bit, and I found that IP add was being added somewhere here-> https://github.com/chenrun0210/elasticsearch-hadoop-7.0/blob/e7c263d8b2d65e4fa0023e1f1cbe762536819f4d/mr/src/main/java/org/elasticsearch/hadoop/rest/NetworkClient.java#L67. - I gave up on tracing through the problem since this is such a simple use case. I thought someone would see whatever dumb thing I did that is preventing this from working (hopefully).

Chris_Bedford · October 16, 2020, 5:52pm

So, the problem was I had another instance of elasticsearch in another window listening on the same port. That always hoses things in strange ways. So.. this adapter has no problem at all. Problem was me.

system · November 13, 2020, 5:52pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Unable to connect Elasticsearch with Spark Elasticsearch es-hadoop	1	2083	October 13, 2019
Spark and elasticsearch-hadoop-2.0.0 Elasticsearch	1	345	July 6, 2017
Writing to Elasticsearch from Spark failing Elasticsearch es-hadoop	1	492	July 21, 2020
SparkStreaming to Elasticesrahc ERROR NetworkClient: Connection timed out: connect Elasticsearch es-hadoop	3	3858	July 6, 2017
Cannot connect Spark to Elasticsearch RESOLVED Elasticsearch es-hadoop	4	5210	July 6, 2017

Elasticsearch-hadoop spark connector unable to connect/write using out-of-box ES server setup, & default library settings

Related topics