Elasticsearch and CDH 5.4 Installation

I have a 1 node (Psuedo Cluster) CDH 5.4 with the latest version of the ES. I have downloaded the es-hadoop as well. What do I need to do to link ES and es-hadoop? I don't understand the installation instructions at the github. Thanks!

The maybe the reference documentation, linked in the README will help.

Here is my first attempt at a Map Reduce with Input in ES and Output in HDFS.



hadoop jar mrinpes.jar MRInpES temperature/logs output -libjars elasticsearch-hadoop-mr-2.1.2.jar

mrinpes.jar is in the 1st directory and elastic-hadoop jars are in the 2nd directory of the HADOOP_CLASSPATH

I still get the error: Exception in thread "main" java.lang.NoClassDefFoundError: org/elasticsearch/hadoop/mr/EsInputFormat

Searching the elasticsearch jar file...The class file is there...
[anant@psvc01nodecdh4 jarfiles]$ jar tvf /opt/esh/dist/elasticsearch-hadoop-mr-2.1.2.jar | grep "EsInputFormat"
2068 Thu Oct 29 17:47:36 MST 2015 org/elasticsearch/hadoop/mr/EsInputFormat$AbstractWritableShardRecordReader.class
1867 Thu Oct 29 17:47:36 MST 2015 org/elasticsearch/hadoop/mr/EsInputFormat$JsonWritableShardRecordReader.class
3676 Thu Oct 29 17:47:36 MST 2015 org/elasticsearch/hadoop/mr/EsInputFormat$ShardInputSplit.class
9248 Thu Oct 29 17:47:36 MST 2015 org/elasticsearch/hadoop/mr/EsInputFormat$ShardRecordReader.class
3336 Thu Oct 29 17:47:36 MST 2015 org/elasticsearch/hadoop/mr/EsInputFormat$WritableShardRecordReader.class
7199 Thu Oct 29 17:47:36 MST 2015 org/elasticsearch/hadoop/mr/EsInputFormat.class

The Hadoop version:

[anant@psvc01nodecdh4 jarfiles]$ hadoop version
Hadoop 2.6.0-cdh5.4.5
Subversion http://github.com/cloudera/hadoop -r ab14c89fe25e9fb3f9de4fb852c21365b7c5608b
Compiled by jenkins on 2015-08-12T21:12Z
Compiled with protoc 2.5.0
From source with checksum d31cb7e46b8602edaf68d335b785ab
This command was run using /usr/lib/hadoop/hadoop-common-2.6.0-cdh5.4.5.jar

Can anyone help me with this issue? What am I missing?


You have not setup the classpath properly. HADOOP_CLASSPATH doesn't really work since many times it is computed automatically by the Hadoop scripts and thus it might be overwritten.

Do not that libjars only works if your code uses GenericToolOptions and Tools - if not, it's better to simply embed the jar within your own and thus not have to worry about it.