I have integrated my ES with hadoop.
Using HIVE "Select * from EXTERNAL_ES_TABLE limit 100" works great with external elasticsearch cluster.
I have LLAP enabled and am using TEZ.
I have also listed webhcat.proxyuser.admin.hosts and webhcat.proxyuser.admin.groups in my hadoop custom configuration. My Hive and Tez view load fine.
However when I do something like: "Select count(*) from EXTERNAL_EX_TABLE where YEAR(ts)='2017'" I start getting error as below:
[08S01]: Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, vertexId=vertex_1503698180066_0037_1_00, diagnostics=[Vertex vertex_1503698180066_0037_1_00 [Map 1] killed/failed due to:INIT_FAILURE, Fail to create InputInitializerManager, org.apache.tez.dag.api.TezReflectionException: Unable to instantiate class with 1 arguments: org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator
at org.apache.tez.common.ReflectionUtils.getNewInstance(ReflectionUtils.java:70)
at org.apache.tez.common.ReflectionUtils.createClazzInstance(ReflectionUtils.java:89)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$1.run(RootInputInitializerManager.java:151)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$1.run(RootInputInitializerManager.java:148)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1865)
at org.apache.tez.dag.app.dag.RootInputInitializerManager.createInitializer(RootInputInitializerManager.java:148)
at org.apache.tez.dag.app.dag.RootInputInitializerManager.runInputInitializers(RootInputInitializerManager.java:121)
at org.apache.tez.dag.app.dag.impl.VertexImpl.setupInputInitializerManager(VertexImpl.java:4620)
at org.apache.tez.dag.app.dag.impl.VertexImpl.access$4400(VertexImpl.java:202)
at org.apache.tez.dag.app.dag.impl.VertexImpl$InitTransition.handleInitEvent(VertexImpl.java:3436)
at org.apache.tez.dag.app.dag.impl.VertexImpl$InitTransition.transition(VertexImpl.java:3385)
at org.apache.tez.dag.app.dag.impl.VertexImpl$InitTransition.transition(VertexImpl.java:3366)
at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
at org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:57)
at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:1938)
at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:201)
at org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:2069)
at org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:2055)
at org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:184)
at org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:115)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.tez.common.ReflectionUtils.getNewInstance(ReflectionUtils.java:68)
... 25 more
Caused by: java.lang.RuntimeException: Failed to load plan: hdfs://doppler-hdp-namenode.c.doppler-cloud.internal:8020/tmp/hive/hive/423767ce-1e5f-4e6b-8dd9-127bf759591e/hive_2017-08-29_02-57-58_486_2719292898503155534-5/hive/_tez_scratch_dir/1b245ab6-210d-4e6e-9a39-b89cd37f59cb/map.xml: org.apache.hive.com.esotericsoftware.kryo.KryoException: Unable to find class: org.elasticsearch.hadoop.hive.EsHiveInputFormat