Elasticsearch and Hive integration failure with es-hadoop-connector 8.8.1

Hi there,

I am trying to set up the newest release Elasticsearch (8.8.1) as a single-node service on an Ubuntu Azure VM and write and read to it with Hive 3.10+ on Hadoop HDInsight cluster (hortonworks based).

For the single-node ES service, I set the below config in the eslasticsearch.yml:

network.host: 0.0.0.0 
http.port: 9200
cluster.name: bob-es
discovery.type: single-node

# default setting from the distribution
xpack.security.enabled: true
xpack.security.enrollment.enabled: true
xpack.security.http.ssl:
  enabled: true
  keystore.path: certs/http.p12
 
xpack.security.transport.ssl:
  enabled: true
  verification_mode: certificate
  keystore.path: certs/transport.p12
  truststore.path: certs/transport.p12 
http.host: 0.0.0.0

Before interacting with hive, I created the index, and query the index with cURL without any error.

"Telnet" to the elastic VM with 9200 is also working fine.

In beeline connection session, I did the following:

add jar /home/sshuser/es-hadoop-lib/elasticsearch-hadoop-8.8.1.jar;
add jar /home/sshuser/es-hadoop-lib/commons-httpclient-3.0.1.jar; 

I verify with list jar; that all jars are included, and even check the hiveserver log. These two jars all come from the maven repo site.

Then I created the hive table as such:

CREATE EXTERNAL table IF NOT EXISTS company( 
   id BIGINT,
   name STRING,
   birth STRING,
   addr STRING 
)  
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler' 
TBLPROPERTIES(  
    'es.nodes' = '101.222.154.11', - made-up public IP of the VM 
    'es.port' = '9200',
    'es.nodes.wan.only' = 'true', 
    'es.input.use.sliced.partitions'='false',
    'es.input.json' = 'false',
    'es.resource' = 'company/_doc',
    'es.net.http.auth.user' = 'elastic', 
    'es.net.http.auth.pass' = 'xxxxx',
    'es.net.ssl' = 'true',
    'es.net.ssl.cert.ca' = '/home/sshuser/es-hadoop/cert/http_ca.crt'
);

then I queried the table using select * from company where I got the below error in hiveserver log:

Caused by: org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Cannot detect ES version - 
typically this happens if the network/Elasticsearch cluster is not accessible or 
when targeting a WAN/Cloud instance without the proper setting 'es.nodes.wan.only'
...
Caused by: org.elasticsearch.hadoop.rest.EsHadoopNoNodesLeftException: Connection error 
(check network and/or proxy settings)- all nodes failed; tried [[101.222.154.11:9200]]

Can someone tell me if I miss out anything here? Is there an issue with the "elasticsearch-hadoop-8.8.1.jar" compatibility with elastisearch-8.8.1?

These configuration entries are added ones. The issue remains after I comment them all out and leaving only the default ones.

In addition, I try to change es.nodes' = '101.222.154.11' to private IP of the VM in the Hive table properties while creating a new table. It did not help.

The network testing was done via Telnet from both headnodes, it connected to port 9200 without issue.

Even curl request to create index, define mapping, and put and get requests to Elasticsearch service on Azure VM from both the public and private IPs are completed successfully.

So, I am suspecting there is either something I did wrong with the Hive table properties or perhaps some configs that I need to change in the elasticsearch.yml file.

Note: I tested multiple versions as well, ES-8.8.1 and ES-8.0.0 both are giving the same error showed in the description.

Let me know If you spot anything wrong here or have other ideas to try out.

Any advices are greatly appreciated!

Hi there,

Issue has been resolved after adding the right version httpclient jar by searching for the correct commons-httpclient.jar file in the headnodes through running "locate httpclient" in the headnode.

As to the network side of thing, I kept it simple by disabling all configuration relating to xpacks, then setting only setting the network.host and http.port with cluster master node as well.

Afterwards, I am able to query data records from ES.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.