SparkStreaming to Elasticesrahc ERROR NetworkClient: Connection timed out: connect

(Dar Varley) #1

I'm hitting the following when trying to save a Spark DataFrame to Elasticsearch

16/03/30 18:28:38 ERROR NetworkClient: Node [] failed (Connection timed out: connect); selected next node []

I can ping Elastcisearch from the same machine the Spark app is running
nc -zv 9200_ Connection to 9200 port [tcp/*] succeeded!

My code

SparkConf conf = new SparkConf();
conf.set("", "true");
conf.set("es.nodes", "");

JavaStreamingContext jssc = new JavaStreamingContext(conf, Durations.seconds(1));
JavaDStream<String> lines = jssc.socketTextStream("localhost", 7654);

JavaDStream<String> eventLines = lines.filter((String line) -> line.contains("Event"));
eventLines.foreachRDD((JavaRDD<String> rdd) -> {
    if (!rdd.isEmpty()) {

        SQLContext sqlContext = SQLContext.getOrCreate(rdd.context());
        DataFrame dataFrame =;

        DataFrame resultDataFrame = sqlContext.sql("Select * from Events");;
        JavaEsSparkSQL.saveToEs(resultDataFrame, "event/states", ImmutableMap.of("", "eventId"));


Full stacktrace from Spark app

(Dar Varley) #2

Solved the problem by adding

    conf.set("es.nodes.discovery", "false");
    conf.set("", "false");

(Costin Leau) #3

"Interesting" fix. It reduces the number of calls to the cluster and it does allow master nodes to be used - considering you are only using one node, it will end up using that one all the time.

(system) #4