RoR/es-hadoop connector/Hive problem

I'm experiencing the issue when querying ES with RoR (readonlyrest) from external Hive table with property 'es.net.http.header.Authorization'='Basic (base64 encoded user:pwd)' in hive table definition.

Running select statement from table outputs the error "Failed with exception Java.io.IOException:org.elasticsearch.hadoop.rest.EsHadoopInvalidRequest: [HEAD] on
[index_name] failed; server[server_name:port] returned [401|Unauthorized:]"

Running cURL on the same es index with the same user/pwd authorization (curl -HGET -H 'Authorization:Basic (base64 encoded user:pwd)' 'http://server:port/index/_search') returns correct response and proves that RoR/ES have been configured properly.
Also, the same table without the 'es.net.http.header.Authorization' in its definition works well too.

Environment: RHEL 6.6, open jdk 1.8.0_141, ES 5.6.3, RoR 1.16.14_es5.6.3, Hadoop 2.7, Hive 1.2.100.2.6.3, elasticsearch-hadoop-6.0.0.jar.

Had anyone faced this issue or successfully queried Elasticsearch 5.6 with Readonlyrest plugin from external Hive table with the es.net.http.header.Authorization ?
Any help or suggestion will be appreciated.

Are you able to get a successful response by replicated the failed request with curl -I -H 'Authorization:Basic (base64 encoded user:pwd)' 'http://server:port/index' ?

Yes, successful response is:
HTTP/1.1 200 OK
x-ror available groups: team3
X-RR-User: tom
Content-type: application/json; charset=UTF-8

Content-length: 907

That's interesting. It should be noted that we do not currently test the project across the ROR plugin, so there may be issues with how the two are handling the communication. Could you include the TRACE level logs for org.elasticsearch.hadoop.rest.commonshttp package on the Spark worker side when it fails to run?

Here are some details and what I'm trying to achieve.
I'm trying to configure and run Elasticsearch-5.6.3 with Readonlyrest plugin and Hadoop/Hive (es-hadoop-6.0 connector ) on RHEL and on different servers. Hadoop 2.7.3.2.6.3.0-235, Hive 1.2.1.100.2.6.3.0-235.
The readonlyrest plugin on Elasticserch side is set of jar files inside $ES_HOME/plugins/readonlyrest folder and readonlyrest.yml file in $ES_HOME/plugins/config folder.
I successfully tested querying Elasticsearch using hive external table, when Readonlyrest plugin disabled.
When I enabled Readonlyrest plugin, restarted Elasticsearch, then execution of "create external table ..." failed.
Do I need to add some configuration property/variable, like es.xyz or anything else, to Hive External Table definition or to the current Hive session in order to let Hive table or/and es-hadoop connector know that Elasticsearch is running with enabled Readonlyrest plugin (or any other enabled plugin) ?

I understand that it would be much harder to resolve the issue without providing detailed TRACE level logs, but I'm not allowed to disclose any sensitive information. Also I'm not getting how to " include the TRACE level logs for org.elasticsearch.hadoop.rest.commonshttp package on the Spark worker side", I'm not running Spark at all.
Thanks in advance.

Would you be able to reproduce the error with some rudimentary test data in a local environment?

Terribly sorry, force of habit. Find-Replace "Spark worker" with "HiveServer / Job Executors". Generally this involves updating your log4j settings on Hive and Hadoop.

Testing es-hadoop 6.0 connector with ES-5 and Readonlyrest plugin by granting/revoking access to ES-5 server from Hadoop/Hive server.

Note: I successfuly tested ES-5 with Readonlyrest plugin already when making requests using cURL.

ES-5.6.3 with enabled Readonlyrest plugin is running on IP xx.xxx.xx.55 on port 9200
readonlyrest.yml
readonlyrest:
enable: true
response_if_req_forbidden: Access denied by ROR!!!
- name : "Accept all req from hosts"
hosts: [xx.xxx.xx.99]

Hadoop/Hive with es-hadoop-6.0 connector is running on IP xx.xxx.xx.99

READ from ES+ROR

Test Case 1, enabled acces to ES server from Hadoop server

by setting hosts: [xx.xxx.xx.99] in readonlyrest.yml , restarted ES-5

hive> CREATE EXTERNAL TABLE from_es_with_ror (
title string,
author string)
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES('es.nodes'='xx.xxx.xx.55:9200', 'es.resource.read'='books/kindle');

hive> select * from from_es_with_ror;
OK
swimming guide, John Dow
running, Angela Devis

This test case worked fine !

Test Case 2, disabled access to ES server from Hadoop server

by setting hosts: [xx.xxx.xx.00] in readonlyrest.yml , restarted ES-5
hive> select * from from_es_with_ror;
OK
Failed with exception java.io.IOException:org.elasticsearch.hadoop.rest.EsHadoopInvalidRequest:
[HEAD] on [books] failed; server[xx.xxx.xx.55:9200] returned [403|Forbidden:]

That is correct, access forbidden, but I'm not sure if this test case is ok and es-hadoop connector properly handled it because I expected to catch the "Access denied by ROR!!!" (defined in readonlyrest.yml) in the response, as I did when ran cURL -HGET request ?

WRITE to ES+ROR

Test Case 3, enabled access to ES server from Hadoop server

by setting hosts: [xx.xxx.xx.99] in readonlyrest.yml, restarted ES-5

hive> CREATE EXTERNAL TABLE to_es_with_ror (
id string,
title string,
author string)
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES('es.nodes'='xx.xxx.xx.55:9200', 'es.resource.write'='books/kindle',
'es.write.operation'='index', 'es.index.auto.create'='true', 'es.mapping.id'='id',
'es.mapping.names'='title:title, author:author');

hive> insert into table to_es_with_ror values ('5', 'sport', 'michael');
ERROR main: SessionState.java:printError - Vertex failed, vertexName=Map 1...
caused by: org.apache.hadoop.hive.ql.metadata.HiveEception: Hive Runtime Error
while processing row {"tmp_values_col1":"5", "tmp_values_col2":"sport", "tmp_values_col3":"michael"}
...
Caused by: org.elasticsearch.hadoop.rest.EsHadoopInvalidRequest:
Access denied by ROR!!! null .... did not succeed due to VERTEX_FAILURE. failed Vertices:1

And here what is the es.log content on server xx.xxx.xx.55 where Ee+ROR are running:
... 35mFORBIDDEN by default req={...., HIS: [Accept all req from hosts->[hosts->false]]}

This test case does not work properly, because it should allow to index/write into ES+ROR, but it does not and there is the "Access denied by ROR!!!" (defined in readonlyrest.yml) in the response.

Am I missing something or it might be related to es-hadoop connector behavior when making requests from Hadoop/Hive to ES with enabled ROR plugin ?

Thanks in advance

The goal is to test es-hadoop connector along with ES and Readonlyrest plugin while enabling/disabling access to ES in readonlyrest.yml on server level (hosts: [ip addr of hive server]) or on user/index level (using sha256 encrypted user:pwd value and authorization header in the request) as follows:
(1)Reading from ES to Hive
(a)server level enabling/disabling reading - works well
(b)user/index level enabling/disabling reading - works well, but when disabling reading the response does not contain the message declared in readonlyrest.yml
(1)Writing to ES from Hive
(a)server level enabling/disabling writing - doesn't work
(b)user/index level enabling/disabling writing - doesn't work

Here is my resent use case, which failed.

Readonlyrest deployed and enabled on master node with port:9200 on server with IP 11.111.11.88
(1) readonlyrest.yml

readonlyrest:
enable: true
response_if_req_forbidden: access denied by ror
access_control_rules:
- name: "accept requests from host xyz"
hosts: ["11.111.11.55"]

11.111.11.55 is the IP address of the server where Hadoop/Hive external table are declared and insert statement is executed.

The table definition
hive> CREATE EXTERNAL TABLE IF NOT EXISTS write_to_es_ror_enable_xyz_9200 (
title string,
author string)
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES(
'es.nodes'='11.111.11.88',
'es.port'='9200',
'es.resource.write.'='books/kindle',
'es.index.auto.create'='true',
'es.write.operation'='index',
'es.mapping.names'='title:title, author:author');

hive> insert into write_to_es_ror_enable_xyz_9200 values ('vdtitle', 'vdauthor');

After running this insert statement the data has not been written to ES and there was no relevant entry in the ES.log

Hive.log shows the error:
..........
Caused by: org.apache.hadoophive.ql.metadata.HiveException: Hive Runtime Error
while processing row {'tmp_values_col1':'vdtitle','tmp_values_col2':'vdauthor',}
at org.apache.hadoop .....
Caused by org.elasticserch.hadoop.rest.EsHadoopInvalidRequest: access denied by ror
null
at org.elasticserch.hadoop.rest.RestClient.checkResponse (restClient.java:510)
..........

What is interesting here - the line with the access denied by ror, which is exact value of the response_if_req_forbidden, defined in (1)readonlyrest.yml.

Also, after running "cURL -XPUT ... " on allowed server (IP 11.111.11.55) and not allowed (11.111.11.99) I can see the corresponding "ALLOWED" and "FORBIDDEN" entries accordingly in ES.log.

I'm not sure what does this mean.
Might be that request from Hadoop/Hive reached ES+ROR, somehow handled and ES+ROR sent the response back without writing anything in ES.log?

I'm continue to work on the issue and any help will be appreciated.

Does anybody know how and where to inspect the request attributes generated by es-hadoop connector being sent to the elasticsearch ?

The problem is I'm running es-hadoop and insert into Hive external table and writing index into ES-5 with readonlyrest security plugin on particular master and the index has Not been written into ES.
Multinode ES cluster and hive server are running on RHEL and on different servers.

I need to figure out where the request has been corrupted - either on the es-hadoop connector side or on the ES side ?
First of all, I'd like to make sure that request at the output of the es-hadoop connector has been constructed properly and how can I inspect or prove that ?

Also note:
(1)Running cURL on hive server to write/index to ES with readonlyrest enabled works fine and index has been written to ES via master node with readonlyrest plugin, so, there is no connection problem between hive and ES servers:
cURL -XPUT -H 'Authorization: Basic <Base64(tom:tompwd)>' 'http://11.111.11.11:9200/ind/type/id' -d '{"title":"testtitle", "author":"testauthor"}'
(2)writing from hive to ES with No readonlyrest works well too

Below are configuration file, hive table/insert and log files

relasticsearch.yml
readonlyrest:
enable: true
response_if_req_forbidden: access denied by ror
access_control_rules:
- name: "accept requests from users in group team2 on index books"
type: allow
groups: ["team2"]
actions: ["indices:data/write/", "indices:data/read/"]
indices: ["books","flowers"]
users:
- username: tom
auth_key_sha256: ab3.....1k
groups: ["team2"]

hive> CREATE EXTERNAL TABLE IF NOT EXISTS write_to_es_ror_enable_xyz_9200 (
title string,
author string)
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES(
'es.nodes'='11.111.11.88',
'es.port'='9200',
'es.resource.write.'='books/kindle',
'es.index.auto.create'='true',
'es.write.operation'='index',
'es.mapping.names'='title:title, author:author',
'es.net.http.header.Authorization'='Basic <Base64(tom:tompwd)>');

hive> insert into write_to_es_ror_enable_xyz_9200 values ('testtitle', 'testauthor');

hive.log
ERROR [Atlas Logger 0]: httpcclient.HttpMethodDirector.java:processAuthenticationResponse(671)
org.apache.commons.httpclient.auth.MalformedChallengeException: Invalid challenge: Basic
...
org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Cannot detect ES version - typically this happens if the network/Elasticsearch cluster is not accessible or when targeting a WAN/Cloud instance without the proper settings 'es.nodes.wan.only'
.....
Caused by: org.elasticsearch.hadoop.rest.EsHadoopInvalidRecuest: access denied by ror
.....

es.log shows the request attributes and they seems to be definitely wrong
FORBIDDEN by default req={ID:123…, TYP:MainRequest, CGR:N/A, USR:tom, BRS:true, ACT:cluster:monitor/main, OA:11.111.11.55, IDX:<N/A>, MET:GET, PTH:/, CNT:<N/A>, HDR:{Accept=application/json, Authorization=Basic <Base64(tom:tompwd)>, content-length=0, Content-type=application/json,
Host=11.111.11.88:9200, User-Agent+Jakarta Commons-HttpClient/3.1}, HIS:[Accept all requests from users in team2 on index books->[auth_key_sha256->true, indices->true, actions->false]]}

As mentioned above, setting TRACE level logs for org.elasticsearch.hadoop.rest.commonshttp package should log all http requests to and responses from elasticsearch on the connector side. Additionally you could use something like TCPdump to watch the actual bytes go by if your infrastructure allows for tools like it. Hopefully this helps!

Hi James,
Setup the TRACE level log for es-hadoop connector inside the /etc/hive/2.6.4.0-91/0/hive-log4j.properties correct ?
Also, log4j.logger.org.elasticsearch.hadoop.rest.commonshttp=TRACE or log4j.category.org.elasticsearch.hadoop.rest.commonshttp=TRACE ?
Running Hive insert and look at the log inside /tmp/[user]/hive.log ?
Thanks

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.