Problems Indexing a Nested document to Elasticsearch using Hive


(sarya) #1

Hi ,
We are trying to index nested documents to elasticsearch using hive . The document that is failing is about 1.5Mb in size and has one parent and about 900 children documents under that . We have applied best practices as per the link below .

https://www.elastic.co/guide/en/elasticsearch/hadoop/current/hive.html

We are loading the document as JSON using string data type .

We are using elasticsearch 1.5 and hive 0.10 .

We keep getting the error as below .
2015-12-16 16:18:36,914 FATAL ExecMapper: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row ....
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:565)
at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:143)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: org.elasticsearch.hadoop.rest.EsHadoopInvalidRequest: ElasticsearchParseException[Failed to derive xcontent from org.elasticsearch.common.bytes.BytesArray@1]
at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:320)
at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:297)
at org.elasticsearch.hadoop.rest.RestClient.bulk(RestClient.java:149)
at org.elasticsearch.hadoop.rest.RestRepository.sendBatch(RestRepository.java:170)
at org.elasticsearch.hadoop.rest.RestRepository.doWriteToIndex(RestRepository.java:152)
at org.elasticsearch.hadoop.rest.RestRepository.writeProcessedToIndex(RestRepository.java:146)
at org.elasticsearch.hadoop.hive.EsHiveOutputFormat$EsHiveRecordWriter.write(EsHiveOutputFormat.java:63)
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:606)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:474)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:800)
at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:474)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:800)
at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:83)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:474)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:800)
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:546)

Any help is appreciated .

Thanks
Subra


(system) #2