ES-hive query error--EsHiveSplit not found

Version of ES: 2.4.1
version of Es-Hadoop jar: 2.4.0
CDH version: 5.8.3

Hello All:

I am using Elasticsearch-hadoop jar to connect ES and hive. Following my create table SQL:

CREATE EXTERNAL TABLE weather_observe_t (
	errorcode	string,
	time	string)
ROW FORMAT SERDE 'org.elasticsearch.hadoop.hive.EsSerDe'
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES(
'es.nodes' = '10.24.56.99',
'es.port' =  '9200',
'es.resource' = 'logstash-weather-observe/weatherlog',
'es.query' = '?q=*',
'es.field.read.empty.as.null' = 'false',
'es.mapping.names' = 'errorcode:errorCode');

I excuted select SQL:
select * from weather_observe_t;
It's done and got the result.

Then I excuted another SQL:
select * from weather_observe_t order by time desc limit 10;

But I got the error:
Error: java.io.IOException: Cannot create an instance of InputSplit class = org.elasticsearch.hadoop.hive.EsHiveInputFormat$EsHiveSplit:Class org.elasticsearch.hadoop.hive.EsHiveInputFormat$EsHiveSplit not found
_ at org.apache.hadoop.hive.ql.io.HiveInputFormat$HiveInputSplit.readFields(HiveInputFormat.java:166)
_ at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:71)
_ at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:42)
_ at org.apache.hadoop.mapred.MapTask.getSplitDetails(MapTask.java:372)
_ at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:426)
_ at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
_ at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
_ at java.security.AccessController.doPrivileged(Native Method)
_ at javax.security.auth.Subject.doAs(Subject.java:415)
_ at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709)
_ at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
_Caused by: java.lang.ClassNotFoundException: Class org.elasticsearch.hadoop.hive.EsHiveInputFormat$EsHiveSplit not found
_ at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2105)
_ at org.apache.hadoop.hive.ql.io.HiveInputFormat$HiveInputSplit.readFields(HiveInputFormat.java:163)
_ ... 10 more

Any help is greatly appreciated.

Is the ES-Hadoop jar correctly distributed to the cluster running Hive? Normally Hive is able to perform the basic select * queries in the local process as it's just reading the data as is from the source, but any further operations like grouping or sorting needs to be handled by the distributed portion of Hive. In this case the most likely cause is that the second query is trying to submit a job and the ES-Hadoop jar is not available on the job workers' classpaths.

I make a mistake before.

I add the elasticsearch-hadoop-2.4.0 in the hive lib. But I forgot there is a elasticsearch-hadoop-2.3.2 in it. Then I remove 2.3.2 jar lib directly.

I think hive could be use the old 2.3.2 jar lib but they couldn't found that. How can I assign the 2.4.0 jar lib in hive?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.