Hive integration always fail

Hello everyone!

I have a problem I can't solve and I tried everything I found on the internet and I can imagine.

I'm triying to integrate elastic search (2.0.2) with cloudera (5.4.0) installed with cloudera manager.

From Hive if I "ADD JAR" first and creating an external table with "STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'" it works. The jar files aren't in HDFS, are in the system filesystem.

But I want to add this directly to hive config. I added the code to hive-site.xml from command line, directly in the server, I added too in the manager option "Advanced Configuration Snippet (Safety Valve) for hive-site.xml" in the server-wide, only server, metastore...

I tested to add with "file://" or without it, but always I get this message:

"Error while compiling statement: FAILED: SemanticException Cannot find class 'org.elasticsearch.hadoop.hive.EsStorageHandler'"

If anybody can help me,

thanks!

First of, for the latest releases of CDH, do use 2.1.0.Beta4 since things changed in Hadoop land in binary incompatible way and 2.0.2 is likely not going to work.
Second, and again this is not particular to es-hadoop, you need the jars to be in HDFS. If they aren't, they need to be on all the nodes where the Hive script might run at the same path.
Basically Hive doesn't handle provisioning so the script is executed as is - if the files are in HDFS, then all nodes have access to it. If you use a file:// uri you refer to the local file-system on every node and thus, unless the file is there, it won't be found.

This is explained in the es-hadoop docs but also in the Hive documentation. I believe, but I'm not sure, CDH has some mechanism to help with uploading the files you want in HDFS but I'm not sure what that is, their docs though should.

Hello, first thanks for your reply.

I tried creating adirectory on HDFS (/user/hive/aux_files/ES-hadoop/) containing all the JARs files provided from the 2.1.0.Beta4 zip file and including it on cloudera safe-valve:

Hive Service Advanced Configuration Snippet (Safety Valve) for hive-site.xml

hive.aux.jars.path /user/hive/aux_files/ES-hadoop/elasticsearch-hadoop-2.1.0.Beta4.jar

and having all JARs in the three nodes using system file system:

hive.aux.jars.path file:///usr/lib/ES-hadoop/elasticsearch-hadoop-2.1.0.Beta4.jar

but it still fails.

But, I found a half solution with HIVE_AUX_JARS_PATH environment variable. IT WORKS but I want to know how to make it working configuring in the hive-site.xml file.

Using this environment variable has the problem with all nodes must have it.

Thanks in advance.

Unfortunately in Hadoop between versions things might change, especially on the configuration side. And it looks like hive.aux.jars.path is a victim of this.
While it should work, as mentioned here, it actually does not once Hive moved over to HiveServer2 so using the env variable seems to be your safest bet.

P.S. Note that the CLI allows this to work while the server does not...

Hello Costin,

thanks for your replies :slight_smile:

I'll keep the configuration using an env variable.

I am having the same problem in defining the Hadoop Connector to Hive. I am using

CDH 5.4
JSON SerDe json-serde-1.3.6-jar-with-dependencies.jar
elasticsearch-hadoop-2.1.1

I added the two additional jars to HDFS in the /user/ec2-user path and then defined the hive.auxpath in Cloudera Manager at the Hive Service Advanced Configuration Snippet (Safety Valve) for hive-site.xml:

  hive.aux.jars.path
  /user/ec2-user/elasticsearch-hadoop-hive-2.2.0-beta1.jar,/user/ec2-user/json-serde-1.3.6-jar-with-dependencies.jar
  A comma separated list (with no spaces) of the jar files

However, when I fire up the Hive CLI, the JSON SerDe isn't found, and when I reference the Hadoop Connector in DDL, I get an exception:

hive> CREATE EXTERNAL TABLE videowatch LIKE output STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler' TBLPROPERTIES('es.resource' = 'videowatch/watch');
FAILED: SemanticException Cannot find class 'org.elasticsearch.hadoop.hive.EsStorageHandler'

Obviously, the classpath didn't get modified. Any ideas? Thanks.

You mention using es-hadoop-2.1.1 but your config indicates 2.2.0-beta1.
Distro specific setups are outside the documentation purpose since they tend to different and each release introduces a slightly different way of specifying things - it would be too hard to track all of them hence why the docs point out what works with vanilla Hive, which should work in your distro as well.

From what I can find, one needs 3 steps to make the hive.aux.jars.path work

Last but not least, you are resurrecting a 5 months old thread - please start a new one.

"Error while compiling statement: FAILED: SemanticException Cannot find class 'org.elasticsearch.hadoop.hive.EsStorageHandler'"

For this error You need to do like this IT will work :slight_smile:

hive.aux.jars.path /home/hduser/elasticsearch-hadoop-2.1.0/dist/elasticsearch-hadoop-hive-2.1.0.jar A comma separated list (with no spaces) of the jar files