Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/hive/ql/metadata/DefaultStorageHandler

Hi All,
Trying to integrate the elasticsearch-2.1.1 with Cloudera (Hadoop 2.6.0-cdh5.5.1) using the elasticsearch-hadoop-2.1.1. Basically it’s for elasticsearch to hadoop integration
While starting the elasticsearch am getting the below error.
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/hive/ql/metadata/DefaultStorageHandler

Version:
Hadoop version : Cloudera (Hadoop 2.6.0-cdh5.5.1)
Elasticsearch : elasticsearch-2.1.1
elasticsearch-hadoop : elasticsearch-hadoop-2.1.1
JVM : 1.7 / 1.8 (Tried with the both version jvm getting the same error)

plugin-descriptor.properties:
description=elasticsearch Hadoop
Version=2.1.1
name=elasticsearch-hadoop
site=false
jvm=true
classname=org.elasticsearch.hadoop.hive.EsStorageHandler
java.version=1.7
elasticsearch.version=2.1.1
Isolated=false

Please let know your comments on this.

Kind Regards
vyas

Now am getting the below Error.

Starting elasticsearch: Exception in thread "main" java.lang.IllegalStateException: jar hell!
class: com.google.common.annotations.Beta
jar1: /usr/share/elasticsearch/lib/hive-exec.jar
jar2: /usr/share/elasticsearch/lib/guava-18.0.jar
at org.elasticsearch.bootstrap.JarHell.checkClass(JarHell.java:280)
at org.elasticsearch.bootstrap.JarHell.checkJarHell(JarHell.java:186)
at org.elasticsearch.bootstrap.JarHell.checkJarHell(JarHell.java:87)
at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:164)
at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:285)
at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:35)

@vyas your post indicates that you are confusing es-hadoop with the hdfs-repository plugin for Elasticsearch.

Please see the documentation preface which explains the difference between each one.

The classpath comes from the fact that ES-Hadoop is not part of the Hive job classpath. More information available in the docs, namely here.

Last and not least, not sure what Hive inside ES itself.

I suggest taking a step back, looking at the components and understanding their purpose; clumping all of them on top of each other will not make things work, quite the opposite.

site=true
jvm=false

Thanks, you are correct, Fixed the issue by making the below change on plugin-descriptor.properties

site=true
jvm=false

And by placing the ELS Hadoop jar below elasticsearch-hadoop-2.1.1/_site. ELS is starting without any error.

Now I have INDEXED some document on ELS and created the external table on hive. While doing a query and getting the blow error.

hive> select * from foo_analysis_doc_ids;

OK

Failed with exception java.io.IOException:java.lang.StringIndexOutOfBoundsException: String index out of range: -1

Below are the actions performed.

curl -v -XPUT http://x.x.x.x:9200/idx_foo

curl -XGET http://x.x.x.x:9200/idx_foo

curl -v -XPUT 'http://107.115.212.95:9200/idx_foo/_mapping/analysis?ignore_conflicts=true' -d '{

"analysis" : {
"properties" : {
"analyserVersion" : {
"type" : "string",
"store" : "true",
"index" : "not_analyzed"
},
"analysisTime" : {
"type" : "date",
"format" : "date_optional_time"
},
"documentIdentifier" : {
"type" : "string",
"store" : true,
"index" : "not_analyzed"
},
"topics" : {
"type" : "string"
}
}
}
}'

curl -v -XPOST 'http://x.x.x.x:9200/_bulk' -d '{ "create": { "_index": "idx_foo", "_type": "analysis" } }

{ "analyserVersion": "0.0.1", "analysisTime": "2015-03-07T16:12:00Z", "documentIdentifier": "doc1", "topics": ["business", "collaboration", "lifehacks", "email"] }
{ "create": { "_index": "idx_foo", "_type": "analysis" } }
{ "analyserVersion": "0.0.1", "analysisTime": "2015-03-07T16:12:01Z", "documentIdentifier": "doc2", "topics": ["sports", "basketball"] }
{ "create": { "_index": "idx_foo", "_type": "analysis" } }
{ "analyserVersion": "0.0.1", "analysisTime": "2015-03-07T16:12:02Z", "documentIdentifier": "doc3", "topics": ["lifehacks", "wardrobe"] }
{ "create": { "_index": "idx_foo", "_type": "analysis" } }
{ "analyserVersion": "0.0.1", "analysisTime": "2015-03-07T16:12:03Z", "documentIdentifier": "doc4", "topics": ["sports", "snowboarding", "mountains"] }
{ "create": { "_index": "idx_foo", "_type": "analysis" } }
{ "analyserVersion": "0.0.1", "analysisTime": "2015-03-07T16:12:04Z", "documentIdentifier": "doc5", "topics": ["outdoors", "hiking", "mountains", "lakes"] }
{ "create": { "_index": "idx_foo", "_type": "analysis" } }
{ "analyserVersion": "0.0.1", "analysisTime": "2015-03-07T16:12:05Z", "documentIdentifier": "doc6", "topics": ["business", "sales", "cars"] }
{ "create": { "_index": "idx_foo", "_type": "analysis" } }
{ "analyserVersion": "0.0.1", "analysisTime": "2015-03-07T16:12:06Z", "documentIdentifier": "doc7", "topics": ["jeep", "cars", "outdoors", "off-roading"] }
{ "create": { "_index": "idx_foo", "_type": "analysis" } }
{ "analyserVersion": "0.0.1", "analysisTime": "2015-03-07T16:12:07Z", "documentIdentifier": "doc8", "topics": ["sports", "fly fishing", "lakes"] }
{ "create": { "_index": "idx_foo", "_type": "analysis" } }
{ "analyserVersion": "0.0.1", "analysisTime": "2015-03-07T16:12:08Z", "documentIdentifier": "doc9", "topics": ["programming", "clojure", "databases"] }'

CREATE DATABASE es_test;
USE es_test;

CREATE EXTERNAL TABLE foo_analysis_doc_ids (
analyser_version STRING,
analysis_time TIMESTAMP,
doc_id STRING,
topics ARRAY
)
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES('es.resource' = 'idx_foo/analysis',
'es.mapping.names' = 'doc_id:documentIdentifier, analysis_time:analysisTime, analyser_version:analyserVersion');

While inserting the table am getting the Error

“Error: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row”

Below are the steps performed.

Step 1:
curl -XPUT http://107.115.212.9:9200/driverdata?pretty -d'{"rels":{"properties":{"givename":{"type":"string"},"surname":{"type":"string"},"city":{"type":"string"},"state":{"type":"string"}}}}'

Step 2:

CREATE TABLE els_testing.testing
(
givename string,
surname string,
city string,
state string
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY "|" LINES TERMINATED BY "\n" STORED AS TEXTFILE;

Step 3 :
CREATE EXTERNAL TABLE els_testing.DRIVERDATA_ES_JAN3
(
givename string,
surname string,
city string,
state string
)
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES (
'es.resource' = 'driverdata/data',
'es.nodes'='107.115.212.9:9200',
'es.index.auto.create' = 'false',
'es.mapping.id'='state');

INSERT INTO TABLE els_testing.testingmapp
SELECT
givename,
surname,
city,
state
FROM
els_testing.testing;

Post the getting below Error

Query ID = hdfs_20160205214747_6b43a8d2-2421-4591-b0f8-781cbcda9bdf
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1454664396911_0220, Tracking URL = http://srvbdadvlsk01.santanderuk.pre.corp:8088/proxy/application_1454664396911_0220/
Kill Command = /opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/lib/hadoop/bin/hadoop job -kill job_1454664396911_0220
Hadoop job information for Stage-0: number of mappers: 1; number of reducers: 0
2016-02-05 21:47:48,802 Stage-0 map = 0%, reduce = 0%
2016-02-05 21:48:15,530 Stage-0 map = 100%, reduce = 0%
Ended Job = job_1454664396911_0220 with errors
Error during job, obtaining debugging information...
Examining task ID: task_1454664396911_0220_m_000000 (and more) from job job_1454664396911_0220

Task with the most failures(4):

Task ID:
task_1454664396911_0220_m_000000

URL:
http://srvbdadvlsk01.santanderuk.pre.corp:8088/taskdetails.jsp?jobid=job_1454664396911_0220&tipid=task_1454664396911_0220_m_000000

Diagnostic Messages for this Task:
Error: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"givename":"kumar","surname":"krishnan","city":"adurai","state":"amilNadu"}
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:179)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"givename":"kumar","surname":"krishnan","city":"adurai","state":"amilNadu"}
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:507)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:170)
... 8 more
Caused by: java.lang.StringIndexOutOfBoundsException: String index out of range: -1
at java.lang.String.substring(String.java:1911)
at org.elasticsearch.hadoop.rest.RestClient.discoverNodes(RestClient.java:110)
at org.elasticsearch.hadoop.rest.InitializationUtils.discoverNodesIfNeeded(InitializationUtils.java:58)
at org.elasticsearch.hadoop.rest.RestService.createWriter(RestService.java:374)
at org.elasticsearch.hadoop.mr.EsOutputFormat$EsRecordWriter.init(EsOutputFormat.java:173)
at org.elasticsearch.hadoop.hive.EsHiveOutputFormat$EsHiveRecordWriter.write(EsHiveOutputFormat.java:58)
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:695)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95)
at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:157)
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:497)
... 9 more

ES-Hadoop 2.2 supports ES 2.x and 1.x. Lower versions support only ES 1.x , hence the exception that you are getting.

P.S. Please format your posts (it's super easy) since it's quite hard to read them (and thus answer).

Hi As you said ES-Hadoop 2.2 supports ES 2.x , I have

ADD JAR /home/cloudera/elasticsearch-2.3.5/lib/elasticsearch-hadoop-2.1.1.jar;

and my ES version is 2.3.5 .

I am also getting the same error ,please allow me to explain in full ,It would be a bit lengthy as I need to send you all the details :-

n hive shell I am firing the below command

hive> ADD JAR /home/cloudera/Downloads/elasticsearch-hadoop-2.1.1.jar;
Added [/home/cloudera/Downloads/elasticsearch-hadoop-2.1.1.jar] to class path
Added resources: [/home/cloudera/Downloads/elasticsearch-hadoop-2.1.1.jar]

hive> CREATE EXTERNAL TABLE mcollect(Product string)

ROW FORMAT SERDE 'org.elasticsearch.hadoop.hive.EsSerDe'
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES('es.nodes' = 'http://localhost:9200/', 'es.index.auto.create' = 'true', 'es.resource' = 'ecommerce/Product', 'es.mapping.Product' = 'Product:esproduct');
OK
Time taken: 0.638 seconds
when I am trying to do the below I am getting the below error

hive> INSERT OVERWRITE TABLE mcollect select Product from sales2;

Query ID = cloudera_20160831013434_9d2882a6-67fa-40e4-9902-8b59d01428ab
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1472630763866_0002, Tracking URL = http://quickstart.cloudera:8088/proxy/application_1472630763866_0002/
Kill Command = /usr/lib/hadoop/bin/hadoop job -kill job_1472630763866_0002
Hadoop job information for Stage-0: number of mappers: 1; number of reducers: 0
2016-08-31 01:34:14,895 Stage-0 map = 0%, reduce = 0%
2016-08-31 01:34:49,589 Stage-0 map = 100%, reduce = 0%
Ended Job = job_1472630763866_0002 with errors
Error during job, obtaining debugging information...
Job Tracking URL: http://quickstart.cloudera:8088/proxy/application_1472630763866_0002/
Examining task ID: task_1472630763866_0002_m_000000 (and more) from job job_1472630763866_0002

Task with the most failures(4):
Task ID:
task_1472630763866_0002_m_000000

URL:
http://0.0.0.0:8088/taskdetails.jsp?jobid=job_1472630763866_0002&tipid=task_1472630763866_0002_m_000000

Diagnostic Messages for this Task:
Error: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"product":"prodq"}
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:179)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"product":"prodq"}
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:507)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:170)
... 8 more
Caused by: java.lang.StringIndexOutOfBoundsException: String index out of range: -11
at java.lang.String.substring(String.java:1911)
at org.elasticsearch.hadoop.rest.RestClient.discoverNodes(RestClient.java:110)
at org.elasticsearch.hadoop.rest.InitializationUtils.discoverNodesIfNeeded(InitializationUtils.java:58)
at org.elasticsearch.hadoop.rest.RestService.createWriter(RestService.java:374)
at org.elasticsearch.hadoop.mr.EsOutputFormat$EsRecordWriter.init(EsOutputFormat.java:173)
at org.elasticsearch.hadoop.hive.EsHiveOutputFormat$EsHiveRecordWriter.write(EsHiveOutputFormat.java:58)
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:697)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95)
at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:157)
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:497)
... 9 more

FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce Jobs Launched:
Stage-Stage-0: Map: 1 HDFS Read: 0 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 0 msec.

The structure of Sales2 table is as below :-
create external table sales2(Product string) row format delimited fields terminated by ',' location '/user/cloudera/dir1';

Kindly help me for the above issue .

Thanks
Prosenjit