Hive not working with Elasticsearch

I am trying to bring Hive table data to Elasticsearch but it's failing to get this through, any help will be appreciated.

The error I get is "Unable to find class: org.elasticsearch.hadoop.hive.EsHiveInputFormat" and "FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask"
The hive engine is set to mr

Logging initialized using configuration in file:/etc/hive/2.6.0.3-8/0/hive-log4j.properties
OK
Time taken: 1.118 seconds
Query ID = hive_20190725220921_fd962234-e1a2-41af-89f3-b7e4cc27e86d
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1563949354091_0016, Tracking URL = http://hadoop-node.test.com:8088/proxy/application_1563949354091_0016/
Kill Command = /usr/hdp/2.6.0.3-8/hadoop/bin/hadoop job  -kill job_1563949354091_0016
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2019-07-25 22:09:28,184 Stage-1 map = 0%,  reduce = 0%
2019-07-25 22:09:41,582 Stage-1 map = 100%,  reduce = 0%
Ended Job = job_1563949354091_0016 with errors
Error during job, obtaining debugging information...
Examining task ID: task_1563949354091_0016_m_000000 (and more) from job job_1563949354091_0016

Task with the most failures(4):
-----
Task ID:
  task_1563949354091_0016_m_000000

URL:
  http://hadoop-node.test.com:8088/taskdetails.jsp?jobid=job_1563949354091_0016&tipid=task_1563949354091_0016_m_000000
-----
Diagnostic Messages for this Task:
Error: java.lang.RuntimeException: Failed to load plan: hdfs://hadoop-node.test.com:8020/tmp/hive/hive/d6d4079b-4813-4e2f-97be-5ea3acea7efa/hive_2019-07-25_22-09-21_665_114610607148402302-1/-mr-10002/268762c6-18a3-467a-936e-7cab06dd1f1c/map.xml: org.apache.hive.com.esotericsoftware.kryo.KryoException: Unable to find class: org.elasticsearch.hadoop.hive.EsHiveInputFormat
Serialization trace:
inputFileFormatClass (org.apache.hadoop.hive.ql.plan.TableDesc)
tableInfo (org.apache.hadoop.hive.ql.plan.FileSinkDesc)
conf (org.apache.hadoop.hive.ql.exec.FileSinkOperator)
childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator)
childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)
aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)
	removing log lines due to character limitations
	.
	.
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:164)
Caused by: org.apache.hive.com.esotericsoftware.kryo.KryoException: Unable to find class: org.elasticsearch.hadoop.hive.EsHiveInputFormat
Serialization trace:
inputFileFormatClass (org.apache.hadoop.hive.ql.plan.TableDesc)
tableInfo (org.apache.hadoop.hive.ql.plan.FileSinkDesc)
conf (org.apache.hadoop.hive.ql.exec.FileSinkOperator)
childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator)
childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)
aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)
        at org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:138)
        at org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:115)
        at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:656)
        at org.apache.hive.com.esotericsoftware.kryo.serializers.DefaultSerializers$ClassSerializer.read(DefaultSerializers.java:238)
        at org.apache.hive.com.esotericsoftware.kryo.serializers.DefaultSerializers$ClassSerializer.read(DefaultSerializers.java:226)
        at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:745)
        at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:113)
        at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
        at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
        at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
        at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
        at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776)
        at org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:139)
        at org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:17)
        at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
        at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
        at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
        at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:672)
        at org.apache.hadoop.hive.ql.exec.Utilities.deserializeObjectByKryo(Utilities.java:1182)
        at org.apache.hadoop.hive.ql.exec.Utilities.deserializePlan(Utilities.java:1069)
        at org.apache.hadoop.hive.ql.exec.Utilities.deserializePlan(Utilities.java:1083)
        at org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:439)
        ... 13 more
Caused by: java.lang.ClassNotFoundException: org.elasticsearch.hadoop.hive.EsHiveInputFormat
        at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:348)
        at org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:136)
        ... 49 more

Container killed by the ApplicationMaster.
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143

FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1   HDFS Read: 0 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 0 msec

Please make sure that ES-Hadoop is available on the job's classpath using the installation instructions for Hive. We've seen in the past that the best way to install is via the ADD JAR directive instead of relying on the Hive jar path settings.

Thanks James for you response. I have not tried to add it in the command line yet but I will try that and update the output.

ok James, it worked !!! Since I am planning to use script, I have added the es-hadoop modules to the command itself. Many thanks for you pointer and it helped me, I appreciate it. Still I have 2 more clarifications/questions from below output

Logging initialized using configuration in file:/etc/hive/2.6.0.3-8/0/hive-log4j.properties
OK
Time taken: 1.123 seconds
Query ID = hive_20190821053037_71f06b01-32a4-4ae4-9aad-4f060b6ecf72
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1566299397287_0001, Tracking URL = http://hadoop-node.test.com:8011/proxy/app_1566299397287_0001/
**Kill Command = /usr/hdp/2.6.0.3-8/hadoop/bin/hadoop job  -kill job_1566299397287_0001**
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2019-08-21 05:30:43,844 Stage-1 map = 0%,  reduce = 0%
2019-08-21 05:30:49,051 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.4 sec
MapReduce Total cumulative CPU time: 2 seconds 400 msec
Ended Job = job_1566299397287_0001
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1   Cumulative CPU: 2.4 sec   HDFS Read: 4904 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 400 msec
OK
Time taken: 13.121 seconds
  1. why is the job getting killed as highlighted in above output?
  2. How do I ensure to read continuously from Hive table and update in ES in same index.

I'm not sure I see the failure in your logs here, can you share the exact line that details the failure?

Reading continuously from Hive table and updating in ES will require a regularly scheduled Hive query to perform the updates. The ES Hive integration just creates an external table over the existing ES index. Since Hive is a batch tool, you'll need to do regular data synchronization on your own, or switch to a streaming library that can split the data between the two datastores

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.