Insert data from hive to elasticsearch


(Gzvince) #1

I have CDH 5.3.2 and Elasticsearch 1.5.2.
Then I have used Elasticsearch-hadoop 2.0.2 to connected them.

I have created a table in Hive for es:

CREATE EXTERNAL TABLE user  (id INT, name STRING)    
ROW FORMAT SERDE 'org.elasticsearch.hadoop.hive.EsSerDe'
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'  
TBLPROPERTIES('es.nodes' = '10.88.1.2', 'es.resource' = 'radiott/artiststt','es.index.auto.create' = 'true');

And then I insert some data in the table:

INSERT OVERWRITE TABLE user SELECT s.id, s.username FROM test.eslog s limit 10;

When I try to insert the data. I get the following message:

Starting Job = job_1428991871637_0126, Tracking URL = http://rl-hadoop1:8088/proxy/application_1428991871637_0126/
Kill Command = /opt/cloudera/parcels/CDH-5.3.2-1.cdh5.3.2.p0.10/lib/hadoop/bin/hadoop job  -kill job_1428991871637_0126
Hadoop job information for Stage-0: number of mappers: 0; number of reducers: 0
2015-05-29 11:21:21,098 Stage-0 map = 0%,  reduce = 0%
Ended Job = job_1428991871637_0126 with errors
Error during job, obtaining debugging information...
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce Jobs Launched: 
Stage-Stage-0:  HDFS Read: 0 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 0 msec

When I dig deeper in Hadoop logs I get the following error (warn):

PriviledgedActionException as:admin (auth:PROXY) via mapred (auth:SIMPLE) cause:java.io.FileNotFoundException: File does not exist: /user/admin/.staging/job_1428991871637_0124/job_1428991871637_0124.summary at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:65) at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:55) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1879) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1820) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1800) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1772) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:527) at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getBlockLocations(AuthorizationProviderProxyClientProtocol.java:85) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:356) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)

What am I missing? Thanks in advance!


(Gzvince) #2

I'm a hadoop newer. Please help.


(Costin Leau) #3

Looks like an error with your Hadoop/CDH install - maybe some configuration files are missing or potentially there are some user credentials problems?

Also note that you don't have to specify the SerDe - what makes you do that? The docs don't specify it and neither the readme.
Additionally, you can try the 2.1.0.Beta4 build - see the reference docs here


(Gzvince) #4

I try the 2.1.0.Beta4 build and it works very well. Thank you very much :smile:


(Costin Leau) #5

Great! Glad to hear it.


(Costin Leau) #6

This is a test replyThis is a test replyThis is a test replyThis is a
test replyThis is a test replyThis is a test reply


(system) #7