Hello,
ES: 6.2.3
Hadoop Plugin 6.2.3
I have an external table I want to be populated with some data from a Hive table (Parquet files)
When I execute:
INSERT OVERWRITE TABLE external_es SELECT field-name1
,field-name2
,...,field-name20
FROM parquet limit 999000000;
I get:
Status: Failed
Vertex failed, vertexName=Reducer 2, vertexId=vertex_1573460958877_2186_1_01, diagnostics=[Task failed, taskId=task_1573460958877_2186_1_01_000000, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task:java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {"key":{},"value":{"_col0":"1.0","_col1":4,"_col2":1581674662000,"_col3":"","_col4":"d4b5cf34f7d0ee4a15c0b231f3e2aaa2","_col5":"8255bb0f2594fc19a3f752286f8a5a76","_col6":100,"_col7":"XXXXXXXXXXXXXXXXXXx","_col8":"","_col9":"","_col10":"10.0.18362.256","_col11":"","_col12":"","_col13":"XXXXXXXXXXXXXXXXXXx","_col14":null,"_col15":"clean","_col16":"12.22.197.186","_col17":"ore.amz","_col18":"xxxx","_col19":"am_engines","_col20":"61dc8483-c8bc-402b-a325-a4dc6fb10567","_col21":"e9ebd55d-2b3e-4291-9ba9-d966e619d4c4","_col22":null,"_col23":null,"_col24":null,"_col25":true,"_col26":"x64","_col27":"XXX","_col28":{"lat":34.281,"lon":-119.1702},"_col29":{"region_name":"California","country_code":"US","country_name":"United States","city":"Ventura","continent_code":"NA","asn":3421,"organization":"Company"}}}
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:173)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:139)
at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:347)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:194)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:185)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:185)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:181)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {"key":{},"value":{"_col0":"1.0","_col1":4,"_col2":1581674662000,"_col3":"","_col4":"d4b5cf34f7d0ee4a15c0b231f3e2aaa2","_col5":"8255bb0f2594fc19a3f752286f8a5a76","_col6":100,"_col7":"","_col8":"","_col9":"","_col10":"10.0.18362.256","_col11":"","_col12":"","_col13":"213123241234412#fdfdf","_col14":null,"_col15":"clean","_col16":"12.22.197.186","_col17":"ore.amz","_col18":"xxxx","_col19":"fact","_col20":"61dc8483-c8bc-402b-a325-a4dc6fb10567","_col21":"e9ebd55d-2b3e-4291-9ba9-d966e619d4c4","_col22":null,"_col23":null,"_col24":null,"_col25":true,"_col26":"x64","_col27":"XXXX","_col28":{"lat":34.281,"lon":-119.1702},"_col29":{"region_name":"California","country_code":"US","country_name":"United States","city":"Ventura","continent_code":"NA","asn":3212,"organization":"Company"}}}
at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:352)
at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:237)
at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:266)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:150)
... 14 more
Caused by: org.elasticsearch.hadoop.EsHadoopException: Could not write all entries for bulk operation [1/951]. Error sample (first [5] error messages):
failed to parse [origin_ip]
Bailing out...
at org.elasticsearch.hadoop.rest.bulk.BulkProcessor.flush(BulkProcessor.java:475)
at org.elasticsearch.hadoop.rest.bulk.BulkProcessor.add(BulkProcessor.java:106)
at org.elasticsearch.hadoop.rest.RestRepository.doWriteToIndex(RestRepository.java:187)
at org.elasticsearch.hadoop.rest.RestRepository.writeProcessedToIndex(RestRepository.java:183)
at org.elasticsearch.hadoop.hive.EsHiveOutputFormat$EsHiveRecordWriter.write(EsHiveOutputFormat.java:63)
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:763)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:841)
at org.apache.hadoop.hive.ql.exec.LimitOperator.process(LimitOperator.java:54)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:841)
at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88)
at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:343)
What I actually don't understand is why I receive this error after inserting with success about 9M rows into ES index (the mapping in ES is IP).
It says failed to parse [origin_ip]. origin_ip field corresponds to "_col16" and as it can be seen in JSON it's a valid IP ("_col16":"12.22.197.186").
Does anyone have a clue ?
Thanks!