I have a problem I can't solve and I tried everything I found on the internet and I can imagine.
I'm triying to integrate elastic search (2.0.2) with cloudera (5.4.0) installed with cloudera manager.
From Hive if I "ADD JAR" first and creating an external table with "STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'" it works. The jar files aren't in HDFS, are in the system filesystem.
But I want to add this directly to hive config. I added the code to hive-site.xml from command line, directly in the server, I added too in the manager option "Advanced Configuration Snippet (Safety Valve) for hive-site.xml" in the server-wide, only server, metastore...
I tested to add with "file://" or without it, but always I get this message:
"Error while compiling statement: FAILED: SemanticException Cannot find class 'org.elasticsearch.hadoop.hive.EsStorageHandler'"
First of, for the latest releases of CDH, do use 2.1.0.Beta4 since things changed in Hadoop land in binary incompatible way and 2.0.2 is likely not going to work.
Second, and again this is not particular to es-hadoop, you need the jars to be in HDFS. If they aren't, they need to be on all the nodes where the Hive script might run at the same path.
Basically Hive doesn't handle provisioning so the script is executed as is - if the files are in HDFS, then all nodes have access to it. If you use a file:// uri you refer to the local file-system on every node and thus, unless the file is there, it won't be found.
This is explained in the es-hadoop docs but also in the Hive documentation. I believe, but I'm not sure, CDH has some mechanism to help with uploading the files you want in HDFS but I'm not sure what that is, their docs though should.
I tried creating adirectory on HDFS (/user/hive/aux_files/ES-hadoop/) containing all the JARs files provided from the 2.1.0.Beta4 zip file and including it on cloudera safe-valve:
Hive Service Advanced Configuration Snippet (Safety Valve) for hive-site.xml
But, I found a half solution with HIVE_AUX_JARS_PATH environment variable. IT WORKS but I want to know how to make it working configuring in the hive-site.xml file.
Using this environment variable has the problem with all nodes must have it.
Unfortunately in Hadoop between versions things might change, especially on the configuration side. And it looks like hive.aux.jars.path is a victim of this.
While it should work, as mentioned here, it actually does not once Hive moved over to HiveServer2 so using the env variable seems to be your safest bet.
P.S. Note that the CLI allows this to work while the server does not...
I added the two additional jars to HDFS in the /user/ec2-user path and then defined the hive.auxpath in Cloudera Manager at the Hive Service Advanced Configuration Snippet (Safety Valve) for hive-site.xml:
hive.aux.jars.path
/user/ec2-user/elasticsearch-hadoop-hive-2.2.0-beta1.jar,/user/ec2-user/json-serde-1.3.6-jar-with-dependencies.jar
A comma separated list (with no spaces) of the jar files
However, when I fire up the Hive CLI, the JSON SerDe isn't found, and when I reference the Hadoop Connector in DDL, I get an exception:
hive> CREATE EXTERNAL TABLE videowatch LIKE output STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler' TBLPROPERTIES('es.resource' = 'videowatch/watch');
FAILED: SemanticException Cannot find class 'org.elasticsearch.hadoop.hive.EsStorageHandler'
Obviously, the classpath didn't get modified. Any ideas? Thanks.
You mention using es-hadoop-2.1.1 but your config indicates 2.2.0-beta1.
Distro specific setups are outside the documentation purpose since they tend to different and each release introduces a slightly different way of specifying things - it would be too hard to track all of them hence why the docs point out what works with vanilla Hive, which should work in your distro as well.
From what I can find, one needs 3 steps to make the hive.aux.jars.path work
Last but not least, you are resurrecting a 5 months old thread - please start a new one.
"Error while compiling statement: FAILED: SemanticException Cannot find class 'org.elasticsearch.hadoop.hive.EsStorageHandler'"
For this error You need to do like this IT will work
hive.aux.jars.path
/home/hduser/elasticsearch-hadoop-2.1.0/dist/elasticsearch-hadoop-hive-2.1.0.jar
A comma separated list (with no spaces) of the jar files
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.