Issue when adding data to ES index from Hive External Table


(Amol A Gaitonde) #1

I am trying to create Elastic search index using external table in Hive.

Drop table hdce.ES_Ext_movies;
CREATE EXTERNAL TABLE hdce.ES_Ext_movies (
ts_epoch bigint ,
movie_name string)
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES('es.nodes' = 'localhost', 'es.port' = '9201', 'es.nodes.client.only' = 'true', 'es.resource.write ' = 'fromhive/test',
'es.index.auto.create' = 'true');

Inserted records into it
Insert overwrite table hdce.ES_Ext_movies select unix_timestamp(current_timestamp) , A.movie_name from hdce.movies A limit 10;

I am getting error:
INFO : Number of reduce tasks determined at compile time: 1
INFO : In order to change the average load for a reducer (in bytes):
INFO : set hive.exec.reducers.bytes.per.reducer=
INFO : In order to limit the maximum number of reducers:
INFO : set hive.exec.reducers.max=
INFO : In order to set a constant number of reducers:
INFO : set mapreduce.job.reduces=
INFO : number of splits:1
INFO : Submitting tokens for job: job_1465210153149_0115
INFO : The url to track the job: h ttp://hdfc02nn01.amr.corpcom:8088/proxy/application_1465210153149_0115/
INFO : Starting Job = job_1465210153149_0115, Tracking URL = h ttp://hdfc02nn01.amr.corpcom:8088/proxy/application_1465210153149_0115/
INFO : Kill Command = /opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/lib/hadoop/bin/hadoop job -kill job_1465210153149_0115
INFO : Hadoop job information for Stage-0: number of mappers: 1; number of reducers: 1
INFO : 2016-06-08 19:48:46,065 Stage-0 map = 0%, reduce = 0%
INFO : 2016-06-08 19:48:54,531 Stage-0 map = 100%, reduce = 0%, Cumulative CPU 2.34 sec
INFO : 2016-06-08 19:49:23,987 Stage-0 map = 100%, reduce = 100%, Cumulative CPU 2.34 sec
INFO : MapReduce Total cumulative CPU time: 2 seconds 340 msec
ERROR : Ended Job = job_1465210153149_0115 with errors
Error: Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask (state=08S01,code=2)
the reducer job shows this error on the logs

Error: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {"key":{},"value":{"_col0":1465440517,"_col1":"GoldenEye (1995)"}} at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:265) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:444) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {"key":{},"value":{"_col0":1465440517,"_col1":"GoldenEye (1995)"}} at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:253) ... 7 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException: org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Cannot detect ES version - typically this happens if the network/Elasticsearch cluster is not accessible or when targeting a WAN/Cloud instance without the proper setting 'es.nodes.wan.only' at org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:523) at..

Do we know what is the issue?

Thanks.


(James Baiera) #2

You have es.nodes configured to be "localhost". Is your installation of Hive running entirely locally? I only ask because I see Hive reporting a job tracking url that looks remote : http://hdfc02nn01.amr.corpcom:8088/proxy/application_1465210153149_0115/


(system) #3