Sink data from hive to es

Hi guys, I want to sink my data from hive to es.

data in hive just like this:

    CREATE TABLE `dwd_ddjk_hospital`(                  
   `orgid` bigint,                                  
   `orgcode` string,                                
    ..........
   `deptids` array<string>,                         
   `diseasecodes` array<string>,                    
   `deptnames` array<string>,                 
   `diseasenames` array<string>)                    
 ROW FORMAT SERDE                                   
   'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'  
 STORED AS INPUTFORMAT                              
   'org.apache.hadoop.mapred.TextInputFormat'       
 OUTPUTFORMAT                                       
   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'

and each array in hive have 1000-5000 elements.

and when I sink data into my es cluster, it report the following error:

2021-01-25 14:58:45,416 INFO [main] org.apache.hadoop.hive.ql.exec.Utilities: PLAN PATH = hdfs://emr-header-1.cluster-187059:9000/tmp/hive/anonymous/64f4d795-b68c-4d6f-987d-dc1774bc3edc/hive_2021-01-25_14-58-28_239_344285465562172971-505/-mr-10002/b2b74a17-2e59-46bd-a45f-27b2f940cbeb/map.xml
2021-01-25 14:58:45,457 ERROR [main] org.apache.hadoop.hive.ql.exec.mr.ExecMapper: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row
        at org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:973)
        at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:154)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
        at org.apache.hadoop.hive.ql.exec.mr.ExecMapRunner.run(ExecMapRunner.java:37)
        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:465)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:349)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1731)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException:
DeserializeRead detail: Reading byte[] of length 262144 at start offset 0 for length 181063 to read 26 fields with types [bigint, string, string, string, string, string, string, string, string, int, string, string, string, string, string, string, string, string, double, double, string, array<string>, array<string>, array<string>, array<string>, array<string>].  Read field #23 at field start position 2925 for field length 45317
        at org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:928)
        ... 10 more
Caused by: java.lang.ArrayIndexOutOfBoundsException: 1024
        at org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector.setVal(BytesColumnVector.java:187)
        at org.apache.hadoop.hive.ql.exec.vector.VectorDeserializeRow.storePrimitiveRowColumn(VectorDeserializeRow.java:588)
        at org.apache.hadoop.hive.ql.exec.vector.VectorDeserializeRow.storeComplexFieldRowColumn(VectorDeserializeRow.java:778)
        at org.apache.hadoop.hive.ql.exec.vector.VectorDeserializeRow.storeListRowColumn(VectorDeserializeRow.java:822)
        at org.apache.hadoop.hive.ql.exec.vector.VectorDeserializeRow.storeRowColumn(VectorDeserializeRow.java:938)
        at org.apache.hadoop.hive.ql.exec.vector.VectorDeserializeRow.deserialize(VectorDeserializeRow.java:1360)
        at org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:923)
        ... 10 more

I don’t know why there is an array out of bounds problem, do you guys have any solutions?

best regards。

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.