Hi guys, I want to sink my data from hive to es.
data in hive just like this:
CREATE TABLE `dwd_ddjk_hospital`(
`orgid` bigint,
`orgcode` string,
..........
`deptids` array<string>,
`diseasecodes` array<string>,
`deptnames` array<string>,
`diseasenames` array<string>)
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
and each array in hive have 1000-5000 elements.
and when I sink data into my es cluster, it report the following error:
2021-01-25 14:58:45,416 INFO [main] org.apache.hadoop.hive.ql.exec.Utilities: PLAN PATH = hdfs://emr-header-1.cluster-187059:9000/tmp/hive/anonymous/64f4d795-b68c-4d6f-987d-dc1774bc3edc/hive_2021-01-25_14-58-28_239_344285465562172971-505/-mr-10002/b2b74a17-2e59-46bd-a45f-27b2f940cbeb/map.xml
2021-01-25 14:58:45,457 ERROR [main] org.apache.hadoop.hive.ql.exec.mr.ExecMapper: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row
at org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:973)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:154)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapRunner.run(ExecMapRunner.java:37)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:465)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:349)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1731)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException:
DeserializeRead detail: Reading byte[] of length 262144 at start offset 0 for length 181063 to read 26 fields with types [bigint, string, string, string, string, string, string, string, string, int, string, string, string, string, string, string, string, string, double, double, string, array<string>, array<string>, array<string>, array<string>, array<string>]. Read field #23 at field start position 2925 for field length 45317
at org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:928)
... 10 more
Caused by: java.lang.ArrayIndexOutOfBoundsException: 1024
at org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector.setVal(BytesColumnVector.java:187)
at org.apache.hadoop.hive.ql.exec.vector.VectorDeserializeRow.storePrimitiveRowColumn(VectorDeserializeRow.java:588)
at org.apache.hadoop.hive.ql.exec.vector.VectorDeserializeRow.storeComplexFieldRowColumn(VectorDeserializeRow.java:778)
at org.apache.hadoop.hive.ql.exec.vector.VectorDeserializeRow.storeListRowColumn(VectorDeserializeRow.java:822)
at org.apache.hadoop.hive.ql.exec.vector.VectorDeserializeRow.storeRowColumn(VectorDeserializeRow.java:938)
at org.apache.hadoop.hive.ql.exec.vector.VectorDeserializeRow.deserialize(VectorDeserializeRow.java:1360)
at org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:923)
... 10 more
I don’t know why there is an array out of bounds problem, do you guys have any solutions?
best regards。