Insert data from hive to elasticsearch

(Gzvince) #1

I have CDH 5.3.2 and Elasticsearch 1.5.2.
Then I have used Elasticsearch-hadoop 2.0.2 to connected them.

I have created a table in Hive for es:

ROW FORMAT SERDE 'org.elasticsearch.hadoop.hive.EsSerDe'
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'  
TBLPROPERTIES('es.nodes' = '', 'es.resource' = 'radiott/artiststt','' = 'true');

And then I insert some data in the table:

INSERT OVERWRITE TABLE user SELECT, s.username FROM test.eslog s limit 10;

When I try to insert the data. I get the following message:

Starting Job = job_1428991871637_0126, Tracking URL = http://rl-hadoop1:8088/proxy/application_1428991871637_0126/
Kill Command = /opt/cloudera/parcels/CDH-5.3.2-1.cdh5.3.2.p0.10/lib/hadoop/bin/hadoop job  -kill job_1428991871637_0126
Hadoop job information for Stage-0: number of mappers: 0; number of reducers: 0
2015-05-29 11:21:21,098 Stage-0 map = 0%,  reduce = 0%
Ended Job = job_1428991871637_0126 with errors
Error during job, obtaining debugging information...
FAILED: Execution Error, return code 2 from
MapReduce Jobs Launched: 
Stage-Stage-0:  HDFS Read: 0 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 0 msec

When I dig deeper in Hadoop logs I get the following error (warn):

PriviledgedActionException as:admin (auth:PROXY) via mapred (auth:SIMPLE) File does not exist: /user/admin/.staging/job_1428991871637_0124/job_1428991871637_0124.summary at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf( at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf( at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes( at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt( at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations( at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations( at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations( at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getBlockLocations( at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations( at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod( at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ at org.apache.hadoop.ipc.RPC$ at org.apache.hadoop.ipc.Server$Handler$ at org.apache.hadoop.ipc.Server$Handler$ at Method) at at at org.apache.hadoop.ipc.Server$

What am I missing? Thanks in advance!

(Gzvince) #2

I'm a hadoop newer. Please help.

(Costin Leau) #3

Looks like an error with your Hadoop/CDH install - maybe some configuration files are missing or potentially there are some user credentials problems?

Also note that you don't have to specify the SerDe - what makes you do that? The docs don't specify it and neither the readme.
Additionally, you can try the 2.1.0.Beta4 build - see the reference docs here

(Gzvince) #4

I try the 2.1.0.Beta4 build and it works very well. Thank you very much :smile:

(Costin Leau) #5

Great! Glad to hear it.

(Costin Leau) #6

This is a test replyThis is a test replyThis is a test replyThis is a
test replyThis is a test replyThis is a test reply

(system) #7