Error while inserting data from Hadoop to ES

Hello,

ES: 6.2.3
Hadoop Plugin 6.2.3

I have an external table I want to be populated with some data from a Hive table (Parquet files)

When I execute:
INSERT OVERWRITE TABLE external_es SELECT field-name1,field-name2,...,field-name20 FROM parquet limit 999000000;

I get:

Status: Failed

Vertex failed, vertexName=Reducer 2, vertexId=vertex_1573460958877_2186_1_01, diagnostics=[Task failed, taskId=task_1573460958877_2186_1_01_000000, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task:java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {"key":{},"value":{"_col0":"1.0","_col1":4,"_col2":1581674662000,"_col3":"","_col4":"d4b5cf34f7d0ee4a15c0b231f3e2aaa2","_col5":"8255bb0f2594fc19a3f752286f8a5a76","_col6":100,"_col7":"XXXXXXXXXXXXXXXXXXx","_col8":"","_col9":"","_col10":"10.0.18362.256","_col11":"","_col12":"","_col13":"XXXXXXXXXXXXXXXXXXx","_col14":null,"_col15":"clean","_col16":"12.22.197.186","_col17":"ore.amz","_col18":"xxxx","_col19":"am_engines","_col20":"61dc8483-c8bc-402b-a325-a4dc6fb10567","_col21":"e9ebd55d-2b3e-4291-9ba9-d966e619d4c4","_col22":null,"_col23":null,"_col24":null,"_col25":true,"_col26":"x64","_col27":"XXX","_col28":{"lat":34.281,"lon":-119.1702},"_col29":{"region_name":"California","country_code":"US","country_name":"United States","city":"Ventura","continent_code":"NA","asn":3421,"organization":"Company"}}}

at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:173)

at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:139)

at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:347)

at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:194)

at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:185)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:422)

at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)

at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:185)

at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:181)

at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)

at java.util.concurrent.FutureTask.run(FutureTask.java:266)

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

at java.lang.Thread.run(Thread.java:745)

Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {"key":{},"value":{"_col0":"1.0","_col1":4,"_col2":1581674662000,"_col3":"","_col4":"d4b5cf34f7d0ee4a15c0b231f3e2aaa2","_col5":"8255bb0f2594fc19a3f752286f8a5a76","_col6":100,"_col7":"","_col8":"","_col9":"","_col10":"10.0.18362.256","_col11":"","_col12":"","_col13":"213123241234412#fdfdf","_col14":null,"_col15":"clean","_col16":"12.22.197.186","_col17":"ore.amz","_col18":"xxxx","_col19":"fact","_col20":"61dc8483-c8bc-402b-a325-a4dc6fb10567","_col21":"e9ebd55d-2b3e-4291-9ba9-d966e619d4c4","_col22":null,"_col23":null,"_col24":null,"_col25":true,"_col26":"x64","_col27":"XXXX","_col28":{"lat":34.281,"lon":-119.1702},"_col29":{"region_name":"California","country_code":"US","country_name":"United States","city":"Ventura","continent_code":"NA","asn":3212,"organization":"Company"}}}

at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:352)

at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:237)

at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:266)

at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:150)

... 14 more

Caused by: org.elasticsearch.hadoop.EsHadoopException: Could not write all entries for bulk operation [1/951]. Error sample (first [5] error messages):

failed to parse [origin_ip]

Bailing out...

at org.elasticsearch.hadoop.rest.bulk.BulkProcessor.flush(BulkProcessor.java:475)

at org.elasticsearch.hadoop.rest.bulk.BulkProcessor.add(BulkProcessor.java:106)

at org.elasticsearch.hadoop.rest.RestRepository.doWriteToIndex(RestRepository.java:187)

at org.elasticsearch.hadoop.rest.RestRepository.writeProcessedToIndex(RestRepository.java:183)

at org.elasticsearch.hadoop.hive.EsHiveOutputFormat$EsHiveRecordWriter.write(EsHiveOutputFormat.java:63)

at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:763)

at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:841)

at org.apache.hadoop.hive.ql.exec.LimitOperator.process(LimitOperator.java:54)

at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:841)

at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88)

at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:343)

What I actually don't understand is why I receive this error after inserting with success about 9M rows into ES index (the mapping in ES is IP).
It says failed to parse [origin_ip]. origin_ip field corresponds to "_col16" and as it can be seen in JSON it's a valid IP ("_col16":"12.22.197.186").

Does anyone have a clue ?

Thanks!

@costin I look forward to your opinion :slight_smile: .

Thanks.

@zizake Please do not ping team members who are not part of the discussion already.

The issue you're seeing seems to be a remote exception from Elasticsearch, specifically that it does not understand the data given to it. I would check your mappings to make sure that the field can be accepted as a string value and that it isn't expecting something else.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.