Sink data from hive to es

yeziblo · January 25, 2021, 7:37am

Hi guys, I want to sink my data from hive to es.

data in hive just like this:

    CREATE TABLE `dwd_ddjk_hospital`(                  
   `orgid` bigint,                                  
   `orgcode` string,                                
    ..........
   `deptids` array<string>,                         
   `diseasecodes` array<string>,                    
   `deptnames` array<string>,                 
   `diseasenames` array<string>)                    
 ROW FORMAT SERDE                                   
   'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'  
 STORED AS INPUTFORMAT                              
   'org.apache.hadoop.mapred.TextInputFormat'       
 OUTPUTFORMAT                                       
   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'

and each array in hive have 1000-5000 elements.

and when I sink data into my es cluster, it report the following error:

2021-01-25 14:58:45,416 INFO [main] org.apache.hadoop.hive.ql.exec.Utilities: PLAN PATH = hdfs://emr-header-1.cluster-187059:9000/tmp/hive/anonymous/64f4d795-b68c-4d6f-987d-dc1774bc3edc/hive_2021-01-25_14-58-28_239_344285465562172971-505/-mr-10002/b2b74a17-2e59-46bd-a45f-27b2f940cbeb/map.xml
2021-01-25 14:58:45,457 ERROR [main] org.apache.hadoop.hive.ql.exec.mr.ExecMapper: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row
        at org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:973)
        at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:154)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
        at org.apache.hadoop.hive.ql.exec.mr.ExecMapRunner.run(ExecMapRunner.java:37)
        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:465)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:349)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1731)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException:
DeserializeRead detail: Reading byte[] of length 262144 at start offset 0 for length 181063 to read 26 fields with types [bigint, string, string, string, string, string, string, string, string, int, string, string, string, string, string, string, string, string, double, double, string, array<string>, array<string>, array<string>, array<string>, array<string>].  Read field #23 at field start position 2925 for field length 45317
        at org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:928)
        ... 10 more
Caused by: java.lang.ArrayIndexOutOfBoundsException: 1024
        at org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector.setVal(BytesColumnVector.java:187)
        at org.apache.hadoop.hive.ql.exec.vector.VectorDeserializeRow.storePrimitiveRowColumn(VectorDeserializeRow.java:588)
        at org.apache.hadoop.hive.ql.exec.vector.VectorDeserializeRow.storeComplexFieldRowColumn(VectorDeserializeRow.java:778)
        at org.apache.hadoop.hive.ql.exec.vector.VectorDeserializeRow.storeListRowColumn(VectorDeserializeRow.java:822)
        at org.apache.hadoop.hive.ql.exec.vector.VectorDeserializeRow.storeRowColumn(VectorDeserializeRow.java:938)
        at org.apache.hadoop.hive.ql.exec.vector.VectorDeserializeRow.deserialize(VectorDeserializeRow.java:1360)
        at org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:923)
        ... 10 more

I don’t know why there is an array out of bounds problem, do you guys have any solutions?

best regards。

system · February 22, 2021, 7:37am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Pushing data from Hive to Elastic Search Elasticsearch	15	1427	July 6, 2017
Hive Runtime Error while processing row Caused by: org.elasticsearch.hadoop.serialization.EsHadoopSerializationException: Elasticsearch	2	1349	December 25, 2018
Data from hive table Elasticsearch	2	437	July 6, 2017
Integration of Hive and Elasticsearch on cloudera Hadoop hive version 1.1.0 Elasticsearch es-hadoop	2	1624	July 6, 2017
Elasticsearch-hadoop-hive exception when writing array<map<string,string>> column Elasticsearch	3	1154	July 6, 2017

Sink data from hive to es

Related topics