Intgration with Hive

Hi All,

This is regarding error we are getting while inserting data from Hive into cluster with id - cluster ID "bbfd9d".
We are trying to insert data from Hive table (external table backed by S3). Previously we had our own elasticsearch cloud and we were able to insert document into the index. However we are now migrating to elastic.co cloud and when we are trying to insert the document from hive we are getting error - "Caused by: org.elasticsearch.hadoop.rest.EsHadoopInvalidRequest:
null".

The current settings for ES in Hive S3 external table is -

'es.nodes'='XXX.aws.found.io',
'es.port'='9200',
'es.net.http.auth.user'='elastic',
'es.net.http.auth.pass'='XXXXXX'

Also version of ES on cluster is 5.1 and Jar file used on Hive machine to insert documents is also 5.1

Can you please let us know if we are missing out on any ES settings while inserting data into index.

Thanks,
Adish

Hi Adish,

You may be a better luck asking your question in the ES-Hadoop part of the forum https://discuss.elastic.co/c/elasticsearch-and-hadoop

Nevertheless, one thing that crosses my mind is setting the es.nodes.wan.only: true in the hadoop connector.

es.nodes.wan.only (default false)
Whether the connector is used against an Elasticsearch instance in a cloud/restricted environment over the WAN, such as Amazon Web Services. In this mode, the connector disables discovery and only connects through the declared es.nodes during all operations, including reads and writes. Note that in this mode, performance is highly affected.

Src: Configuration | Elasticsearch for Apache Hadoop [master] | Elastic

Can you check that?

Thanks,
Igor

Hi @igor_k,

Thank you for your inputs.
As mentioned we have made changes to the es.nodes.wan.only parameter to have value as true.
However we still received error. Then we also changed the value of es.nodes.resolve.hostname=true.

After this we received error as -
'Job Submission failed with exception 'org.elasticsearch.hadoop.EsHadoopIllegalArgumentException(Target index [XXXX/XXX] does not exist and auto-creation is disabled [setting 'es.index.auto.create' is 'false'])'

However Index was already present when queries by REST API. I am checking on the settings that I might have missed for this one.
For time being I have enabled the option(es.index.auto.create) however even after this in 1 hour still no data is inserted into the index and mapper is still running on hadoop side.

I have also moved this question to the Hadoop and Elasticsearch forum.

Thanks.
Adish

Hi @cm_a, could you include full stack traces of both issues you have detailed above? These are incredibly helpful. Could you also increase your logging levels to trace and include those as well? This allows us to check the actual requests and responses that the connector is making back to Elasticsearch when negotiating the job start up and initialization.

Hi @james.baiera,

We have disabled auto creation of index and manually created index on the cluster. However after that when we try to insert the data we received an error and below is full stack trace -

org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Target index [test_influencer_run/topic] does not exist and auto-creation is disabled [setting 'es.index.auto.create' is '
false']
at org.elasticsearch.hadoop.rest.InitializationUtils.doCheckIndexExistence(InitializationUtils.java:276)
at org.elasticsearch.hadoop.rest.InitializationUtils.checkIndexExistence(InitializationUtils.java:266)
at org.elasticsearch.hadoop.mr.EsOutputFormat.init(EsOutputFormat.java:260)
at org.elasticsearch.hadoop.mr.EsOutputFormat.checkOutputSpecs(EsOutputFormat.java:251)
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.checkOutputSpecs(FileSinkOperator.java:1140)
at org.apache.hadoop.hive.ql.io.HiveOutputFormatImpl.checkOutputSpecs(HiveOutputFormatImpl.java:67)
at org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:268)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:139)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:575)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:570)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:570)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:561)
at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:433)
at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:138)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1858)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1562)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1313)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1084)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1072)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:232)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:183)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:399)
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:776)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:714)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:641)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Job Submission failed with exception 'org.elasticsearch.hadoop.EsHadoopIllegalArgumentException(Target index [test_influencer_run/topic] does not exist and auto-creation is disabled [setting 'es.index.auto.create' is 'false'])'
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask. Target index [test_influencer_run/topic] does not exist and auto-creation is disabled [setting 'es.index.auto.create' is 'false']

Please let me know if you require any further details.

Thanks,
Adish

@cm_a Have you defined the type's mapping as well or just created the index? The index must be present, as well as the type with its mapping for the connector to continue operation.

Hi @james.baiera,

We have defined type mapping as well. Please take a look at JSON below - used to create index and let me know if we are missing something.

{
"settings": {
"number_of_shards": 5,
"number_of_replicas": 1
},
"mappings": {
"topic": {
"properties": {
"channelId": {
"index": "not_analyzed",
"type": "string"
},
"topicId": {
"index": "not_analyzed",
"type": "string"
},
"topic_title": {
"type": "string",
"index": "analyzed"
},
"topic_type": {
"type": "string"
},

"channel_title": {
"type": "multi_field",
"fields": {
"channel_title": {
"type": "string"
},
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
},
"channel_description": {
"type": "string",
"index": "analyzed"
},
"influencer_type": {
"type": "string"
},
"countryCode": {
"type": "string"
},
"thumbnail": {
"type": "string"
},
"topicPlatform": {
"type": "string"
},
"viewsCount": {
"type": "double"
},
"commentsCount": {
"type": "double"
},
"videosCount": {
"type": "double"
},
"likesCount": {
"type": "double"
},
"creatorsCount": {
"type": "double"
},
"subscribersCount": {
"type": "double"
},
"views_per_videosCount": {
"type": "double"
},
"comments_per_videosCount": {
"type": "double"
},
"likes_per_videosCount": {
"type": "double"
},
"shares_per_videosCount": {
"type": "double"
},
"format": {
"type": "string",
"index": "analyzed"
},
"owned_type": {
"type": "string",
"index": "analyzed"
},
"shares": {
"type": "double"
},
"category": {
"type": "string",
"index": "analyzed"
},
"search_tags": {
"type": "string",
"index": "analyzed"
},
"publishedBy": {
"type": "string"
},
"published_at": {
"type": "date",
"format": "yyyy-MM-dd"
},
"max_video_publishedAt": {
"type": "date",
"format": "yyyy-MM-dd"
}
}
}
}
}

Thanks,
Adish

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.