Error while inserting data from Hadoop to ES

zizake · February 14, 2020, 1:20pm

Hello,

ES: 6.2.3
Hadoop Plugin 6.2.3

I have an external table I want to be populated with some data from a Hive table (Parquet files)

When I execute:
INSERT OVERWRITE TABLE external_es SELECT field-name1,field-name2,...,field-name20 FROM parquet limit 999000000;

I get:

Status: Failed

Vertex failed, vertexName=Reducer 2, vertexId=vertex_1573460958877_2186_1_01, diagnostics=[Task failed, taskId=task_1573460958877_2186_1_01_000000, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task:java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {"key":{},"value":{"_col0":"1.0","_col1":4,"_col2":1581674662000,"_col3":"","_col4":"d4b5cf34f7d0ee4a15c0b231f3e2aaa2","_col5":"8255bb0f2594fc19a3f752286f8a5a76","_col6":100,"_col7":"XXXXXXXXXXXXXXXXXXx","_col8":"","_col9":"","_col10":"10.0.18362.256","_col11":"","_col12":"","_col13":"XXXXXXXXXXXXXXXXXXx","_col14":null,"_col15":"clean","_col16":"12.22.197.186","_col17":"ore.amz","_col18":"xxxx","_col19":"am_engines","_col20":"61dc8483-c8bc-402b-a325-a4dc6fb10567","_col21":"e9ebd55d-2b3e-4291-9ba9-d966e619d4c4","_col22":null,"_col23":null,"_col24":null,"_col25":true,"_col26":"x64","_col27":"XXX","_col28":{"lat":34.281,"lon":-119.1702},"_col29":{"region_name":"California","country_code":"US","country_name":"United States","city":"Ventura","continent_code":"NA","asn":3421,"organization":"Company"}}}

at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:173)

at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:139)

at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:347)

at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:194)

at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:185)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:422)

at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)

at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:185)

at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:181)

at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)

at java.util.concurrent.FutureTask.run(FutureTask.java:266)

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

at java.lang.Thread.run(Thread.java:745)

Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {"key":{},"value":{"_col0":"1.0","_col1":4,"_col2":1581674662000,"_col3":"","_col4":"d4b5cf34f7d0ee4a15c0b231f3e2aaa2","_col5":"8255bb0f2594fc19a3f752286f8a5a76","_col6":100,"_col7":"","_col8":"","_col9":"","_col10":"10.0.18362.256","_col11":"","_col12":"","_col13":"213123241234412#fdfdf","_col14":null,"_col15":"clean","_col16":"12.22.197.186","_col17":"ore.amz","_col18":"xxxx","_col19":"fact","_col20":"61dc8483-c8bc-402b-a325-a4dc6fb10567","_col21":"e9ebd55d-2b3e-4291-9ba9-d966e619d4c4","_col22":null,"_col23":null,"_col24":null,"_col25":true,"_col26":"x64","_col27":"XXXX","_col28":{"lat":34.281,"lon":-119.1702},"_col29":{"region_name":"California","country_code":"US","country_name":"United States","city":"Ventura","continent_code":"NA","asn":3212,"organization":"Company"}}}

at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:352)

at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:237)

at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:266)

at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:150)

... 14 more

Caused by: org.elasticsearch.hadoop.EsHadoopException: Could not write all entries for bulk operation [1/951]. Error sample (first [5] error messages):

failed to parse [origin_ip]

Bailing out...

at org.elasticsearch.hadoop.rest.bulk.BulkProcessor.flush(BulkProcessor.java:475)

at org.elasticsearch.hadoop.rest.bulk.BulkProcessor.add(BulkProcessor.java:106)

at org.elasticsearch.hadoop.rest.RestRepository.doWriteToIndex(RestRepository.java:187)

at org.elasticsearch.hadoop.rest.RestRepository.writeProcessedToIndex(RestRepository.java:183)

at org.elasticsearch.hadoop.hive.EsHiveOutputFormat$EsHiveRecordWriter.write(EsHiveOutputFormat.java:63)

at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:763)

at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:841)

at org.apache.hadoop.hive.ql.exec.LimitOperator.process(LimitOperator.java:54)

at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:841)

at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88)

at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:343)

What I actually don't understand is why I receive this error after inserting with success about 9M rows into ES index (the mapping in ES is IP).
It says failed to parse [origin_ip]. origin_ip field corresponds to "_col16" and as it can be seen in JSON it's a valid IP ("_col16":"12.22.197.186").

Does anyone have a clue ?

Thanks!

zizake · February 17, 2020, 9:18am

@costin I look forward to your opinion .

Thanks.

james.baiera · February 27, 2020, 4:01pm

@zizake Please do not ping team members who are not part of the discussion already.

The issue you're seeing seems to be a remote exception from Elasticsearch, specifically that it does not understand the data given to it. I would check your mappings to make sure that the field can be accepted as a string value and that it isn't expecting something else.

system · March 26, 2020, 4:01pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Trying to insert data into the external table Elasticsearch es-hadoop	2	1327	October 4, 2019
Data from hive table Elasticsearch	2	437	July 6, 2017
Hive Runtime Error while processing row Caused by: org.elasticsearch.hadoop.serialization.EsHadoopSerializationException: Elasticsearch	2	1349	December 25, 2018
Integration of Hive and Elasticsearch on cloudera Hadoop hive version 1.1.0 Elasticsearch es-hadoop	2	1624	July 6, 2017
Insert data from hive to elasticsearch Elasticsearch es-hadoop	6	2119	July 6, 2017

Error while inserting data from Hadoop to ES

Related topics