SparkStreaming to Elasticesrahc ERROR NetworkClient: Connection timed out: connect

I'm hitting the following when trying to save a Spark DataFrame to Elasticsearch

16/03/30 18:28:38 ERROR NetworkClient: Node [172.18.0.2:9200] failed (Connection timed out: connect); selected next node [10.123.45.67:9200]

I can ping Elastcisearch from the same machine the Spark app is running
nc -zv 10.123.45.67 9200_ Connection to 10.123.45.67 9200 port [tcp/*] succeeded!

My code

SparkConf conf = new SparkConf();
conf.setMaster("local[*]");
conf.setAppName("Analyser");
conf.set("es.index.auto.create", "true");
conf.set("es.nodes", "10.123.45.67:9200");

JavaStreamingContext jssc = new JavaStreamingContext(conf, Durations.seconds(1));
JavaDStream<String> lines = jssc.socketTextStream("localhost", 7654);

JavaDStream<String> eventLines = lines.filter((String line) -> line.contains("Event"));
eventLines.foreachRDD((JavaRDD<String> rdd) -> {
    if (!rdd.isEmpty()) {

        SQLContext sqlContext = SQLContext.getOrCreate(rdd.context());
        DataFrame dataFrame = sqlContext.read().json(rdd);
        dataFrame.registerTempTable("Events");

        DataFrame resultDataFrame = sqlContext.sql("Select * from Events");
        resultDataFrame.show(false);
        JavaEsSparkSQL.saveToEs(resultDataFrame, "event/states", ImmutableMap.of("es.mapping.id", "eventId"));
    }
});

jssc.start();
jssc.awaitTermination();
jssc.stop();

Full stacktrace from Spark app
https://gist.githubusercontent.com/dkirrane/8485d8d6f4c422310ec69d0e89271b35/raw/25f7ee5dc050b8cf88997bf2e9cd1d35d09a7112/45834.log

Solved the problem by adding

    conf.set("es.nodes.discovery", "false");
    conf.set("es.nodes.data.only", "false");
1 Like

"Interesting" fix. It reduces the number of calls to the cluster and it does allow master nodes to be used - considering you are only using one node, it will end up using that one all the time.