Select es-hadoop table from hive failed


(Joe Yang) #1

Dear All:

We just creatd a external table from hive successfully:
CREATE EXTERNAL TABLE user01(id BIGINT, name STRING) STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES('es.resource' = 'radio/artists','es.index.auto.create' = 'true','es.nodes'='192.168.112.172','es.port'='9200');

but when we query it , it failed with the following error, any advice would be appreciated,
hive> select * from user01;
OK
Failed with exception java.io.IOException:org.elasticsearch.hadoop.rest.EsHadoopNoNodesLeftException: Connection error (check network and/or proxy settings)- all nodes failed; tried [[127.0.0.1:9200]]
Time taken: 0.108 seconds
hive>


(Joe Yang) #2

any prompt comment would be appreciated...


(Joe Yang) #3

Dear Alll,

any advice would be welcome ...

if we insert an internal table into this external one, the error message is as follows:

hive> insert overwrite table user01 select * from user_source;
Query ID = csi_20160329111717_af94bd46-34e9-44c8-b29b-51892eeb3c25
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /tmp/hadoop-yarn/staging/csi/.staging/job_145872875 6355_0018/libjars/accumulo-fate-1.6.0.jar could only be replicated to 0 nodes instead of minReplication (=1). There are 1 datanode(s) running and 1 node(s) are excluded in this operation.
at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1550)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getNewBlockTargets(FSNamesystem.java:3110)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3034)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:723)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodePro tocolServerSideTranslatorPB.java:492)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingM ethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)

    at org.apache.hadoop.ipc.Client.call(Client.java:1476)
    at org.apache.hadoop.ipc.Client.call(Client.java:1407)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
    at com.sun.proxy.$Proxy15.addBlock(Unknown Source)
    at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTrans               latorPB.java:418)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
    at com.sun.proxy.$Proxy16.addBlock(Unknown Source)
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1430)
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1226)
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:449)

Job Submission failed with exception 'org.apache.hadoop.ipc.RemoteException(File /tmp/hadoop-yarn/staging/csi/.stagi ng/job_1458728756355_0018/libjars/accumulo-fate-1.6.0.jar could only be replicated to 0 nodes instead of minReplicat ion (=1). There are 1 datanode(s) running and 1 node(s) are excluded in this operation.
at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1550)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getNewBlockTargets(FSNamesystem.java:3110)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3034)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:723)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodePro tocolServerSideTranslatorPB.java:492)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingM ethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)


(Joe Yang) #4

hadoop:2.7.1
hive: 1.2.1
es-hadoop: 2.1.1


(Joe Yang) #5

hive> CREATE EXTERNAL TABLE user01 (id bigint, name STRING) STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
> TBLPROPERTIES('es.resource' = 'radio/artists', 'es.index.auto.create' = 'true');
OK
Time taken: 0.178 seconds
hive> INSERT OVERWRITE TABLE user01 SELECT id, name FROM user_source;
Query ID = csi_20160329152402_516b7db1-38d8-4921-bcae-708b23d73909
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1458728756355_0020, Tracking URL = http://server1:8088/proxy/application_1458728756355_0020/
Kill Command = /home/csi/hadoop/bin/hadoop job -kill job_1458728756355_0020

it hangs on kill commend


(Joe Yang) #6

Hello:

Any prompt update would be appreciated, please..


(Costin Leau) #7

You are showing different configurations and mixing the errors with the so-called code snippets (By the way, formatting helps).

  1. if you don't specify any nodes, es-hadoop will use localhost instead. 99% of the time, this is not what you want.
  2. use the latest es-hadoop, namely 2.2.0.
  3. check out the troubleshooting page
    https://www.elastic.co/guide/en/elasticsearch/hadoop/master/troubleshooting.html

(Joe Yang) #8

Thanks Costin, will look into it and update it soon...


(system) #9