Intgration with Hive

cm_a · January 16, 2017, 10:07am

Hi All,

This is regarding error we are getting while inserting data from Hive into cluster with id - cluster ID "bbfd9d".
We are trying to insert data from Hive table (external table backed by S3). Previously we had our own elasticsearch cloud and we were able to insert document into the index. However we are now migrating to elastic.co cloud and when we are trying to insert the document from hive we are getting error - "Caused by: org.elasticsearch.hadoop.rest.EsHadoopInvalidRequest:
null".

The current settings for ES in Hive S3 external table is -

'es.nodes'='XXX.aws.found.io',
'es.port'='9200',
'es.net.http.auth.user'='elastic',
'es.net.http.auth.pass'='XXXXXX'

Also version of ES on cluster is 5.1 and Jar file used on Hive machine to insert documents is also 5.1

Can you please let us know if we are missing out on any ES settings while inserting data into index.

Thanks,
Adish

igor_k · January 16, 2017, 10:55am

Hi Adish,

You may be a better luck asking your question in the ES-Hadoop part of the forum https://discuss.elastic.co/c/elasticsearch-and-hadoop

Nevertheless, one thing that crosses my mind is setting the es.nodes.wan.only: true in the hadoop connector.

es.nodes.wan.only (default false)
Whether the connector is used against an Elasticsearch instance in a cloud/restricted environment over the WAN, such as Amazon Web Services. In this mode, the connector disables discovery and only connects through the declared es.nodes during all operations, including reads and writes. Note that in this mode, performance is highly affected.

Src: Configuration | Elasticsearch for Apache Hadoop [master] | Elastic

Can you check that?

Thanks,
Igor

cm_a · January 16, 2017, 2:18pm

Hi @igor_k,

Thank you for your inputs.
As mentioned we have made changes to the es.nodes.wan.only parameter to have value as true.
However we still received error. Then we also changed the value of es.nodes.resolve.hostname=true.

After this we received error as -
'Job Submission failed with exception 'org.elasticsearch.hadoop.EsHadoopIllegalArgumentException(Target index [XXXX/XXX] does not exist and auto-creation is disabled [setting 'es.index.auto.create' is 'false'])'

However Index was already present when queries by REST API. I am checking on the settings that I might have missed for this one.
For time being I have enabled the option(es.index.auto.create) however even after this in 1 hour still no data is inserted into the index and mapper is still running on hadoop side.

I have also moved this question to the Hadoop and Elasticsearch forum.

Thanks.
Adish

james.baiera · January 16, 2017, 10:37pm

Hi @cm_a, could you include full stack traces of both issues you have detailed above? These are incredibly helpful. Could you also increase your logging levels to trace and include those as well? This allows us to check the actual requests and responses that the connector is making back to Elasticsearch when negotiating the job start up and initialization.

cm_a · January 17, 2017, 9:28am

Hi @james.baiera,

We have disabled auto creation of index and manually created index on the cluster. However after that when we try to insert the data we received an error and below is full stack trace -

org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Target index [test_influencer_run/topic] does not exist and auto-creation is disabled [setting 'es.index.auto.create' is '
false']
at org.elasticsearch.hadoop.rest.InitializationUtils.doCheckIndexExistence(InitializationUtils.java:276)
at org.elasticsearch.hadoop.rest.InitializationUtils.checkIndexExistence(InitializationUtils.java:266)
at org.elasticsearch.hadoop.mr.EsOutputFormat.init(EsOutputFormat.java:260)
at org.elasticsearch.hadoop.mr.EsOutputFormat.checkOutputSpecs(EsOutputFormat.java:251)
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.checkOutputSpecs(FileSinkOperator.java:1140)
at org.apache.hadoop.hive.ql.io.HiveOutputFormatImpl.checkOutputSpecs(HiveOutputFormatImpl.java:67)
at org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:268)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:139)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:575)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:570)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:570)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:561)
at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:433)
at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:138)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1858)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1562)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1313)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1084)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1072)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:232)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:183)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:399)
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:776)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:714)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:641)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Job Submission failed with exception 'org.elasticsearch.hadoop.EsHadoopIllegalArgumentException(Target index [test_influencer_run/topic] does not exist and auto-creation is disabled [setting 'es.index.auto.create' is 'false'])'
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask. Target index [test_influencer_run/topic] does not exist and auto-creation is disabled [setting 'es.index.auto.create' is 'false']

Please let me know if you require any further details.

Thanks,
Adish

james.baiera · January 18, 2017, 9:29pm

@cm_a Have you defined the type's mapping as well or just created the index? The index must be present, as well as the type with its mapping for the connector to continue operation.

cm_a · January 19, 2017, 7:03am

Hi @james.baiera,

We have defined type mapping as well. Please take a look at JSON below - used to create index and let me know if we are missing something.

{
"settings": {
"number_of_shards": 5,
"number_of_replicas": 1
},
"mappings": {
"topic": {
"properties": {
"channelId": {
"index": "not_analyzed",
"type": "string"
},
"topicId": {
"index": "not_analyzed",
"type": "string"
},
"topic_title": {
"type": "string",
"index": "analyzed"
},
"topic_type": {
"type": "string"
},

"channel_title": {
"type": "multi_field",
"fields": {
"channel_title": {
"type": "string"
},
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
},
"channel_description": {
"type": "string",
"index": "analyzed"
},
"influencer_type": {
"type": "string"
},
"countryCode": {
"type": "string"
},
"thumbnail": {
"type": "string"
},
"topicPlatform": {
"type": "string"
},
"viewsCount": {
"type": "double"
},
"commentsCount": {
"type": "double"
},
"videosCount": {
"type": "double"
},
"likesCount": {
"type": "double"
},
"creatorsCount": {
"type": "double"
},
"subscribersCount": {
"type": "double"
},
"views_per_videosCount": {
"type": "double"
},
"comments_per_videosCount": {
"type": "double"
},
"likes_per_videosCount": {
"type": "double"
},
"shares_per_videosCount": {
"type": "double"
},
"format": {
"type": "string",
"index": "analyzed"
},
"owned_type": {
"type": "string",
"index": "analyzed"
},
"shares": {
"type": "double"
},
"category": {
"type": "string",
"index": "analyzed"
},
"search_tags": {
"type": "string",
"index": "analyzed"
},
"publishedBy": {
"type": "string"
},
"published_at": {
"type": "date",
"format": "yyyy-MM-dd"
},
"max_video_publishedAt": {
"type": "date",
"format": "yyyy-MM-dd"
}
}
}
}
}

Thanks,
Adish

system · January 22, 2017, 7:04am

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Found - Hadoop integration issue Elasticsearch es-hadoop	15	2034	July 6, 2017
Hive ES Hadoop not finding ES cluster Elasticsearch es-hadoop	3	2093	July 6, 2017
Connection error when selecting from ES table in Hive Elasticsearch es-hadoop	4	1304	July 11, 2017
ES-hive integration error- Cannot detect ES version - typically this happens if the network/Elasticsearch cluster is not accessible or when targeting a WAN/Cloud instance without the proper setting 'es.nodes.wan.only' Elasticsearch es-hadoop	1	990	June 13, 2019
Insert data from hive to elasticsearch Elasticsearch es-hadoop	6	2097	July 6, 2017

Intgration with Hive

Related topics