Data from elasticsearch cluster to Azure HDInsight cluster


#1

I am trying to use hive to load data from my two nodes elasticsearch cluster to the Microsoft Azure Hadoop cluster called HDInsight.

I use > curl 'localhost:9200/_cat/master?v' to check my ES cluster master, assume that it is 192.168.0.48

I start the Hive and ADD JAR /path/to/elasticsearch-hadoop-2.3.2.jar

I create an external table shown as follow and it is success.

CREATE EXTERNAL TABLE test(time TIMESTAMP, host STRING, type STRING, message STRING ) STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler' TBLPROPERTIES( 'es.resource'='logstash-2016.06.07/syslog', 'es.nodes'='192.168.0.48', 'es.port'='9200', 'es.nodes.wan.only'='true' , 'es.mapping.names' = 'time:@timestamp , host:host , type:type , message:message');

So far, everything is OK.

But when I use the ES-hadoop to load the data from elasticsearch cluster to Azure HDInsight, it has an error.

Failed with exception java.io.IOException:org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Cannot detect ES version - typically this happens if the network/Elasticsearch cluster is not accessible or when targeting a WAN/Cloud instance without the proper setting 'es.nodes.wan.only'

I setup the es.nodes.wan.only to both true and false and always get this error.

Could someone give me any help´╝č


(James Baiera) #2

Hello!

Setting the es.nodes.wan.only property to true just configures elasticsearch-hadoop to not do any further node discovery and to only use the provided es.nodes hosts for connections. It seems that your provided es.nodes IP is a private network IP. Are you able to normally reach the Elasticsearch node from one of the HDInsight nodes? For instance, if you replaced localhost with your given IP inside of the curl command, and then executed the command from a node in the HDInsight cluster, does it return anything?

Could you also paste a full stack trace if you have one? In some cases it can help highlight the issue when it's all present.


(system) #3