Cannot detect ES Version - Elasticsearch/Cloudera Hive Connectivity

Hi,

I am creating a Cloudera CDH 6.1.2 Hive connection (bare metal cluster) with Elasticsearch 7.9.3 (in openshift environment)

I am getting below error while selecting the data from the table.

Error: java.io.IOException: org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Cannot detect ES version - typically this happens if the network/Elasticsearch cluster is not accessible or when targeting a WAN/Cloud instance without the proper setting 'es.nodes.wan.only' (state=,code=0)

Hive Create Table script contains following configuration:

  'es.nodes'='http://<elasticsearch_service_in_openshift_URL>',
  'es.port' = '',
  'es.resource'='test_index',
  'es.index.read.missing.as.empty'='true',
  'es.mapping.date.rich'='false',
  'es.nodes.discovery' = 'true',
  'es.nodes.resolve.hostname' = 'false',
  'es.query'='?q=*');

I have also tried to enable/disable below property, but the error still persists.
es.nodes.wan.only

Elasticsearch Hive Jars have already been inserted in CDH:

Configuration Location: HiveServer2 Advanced Configuration Snippet (Safety Valve) for hive-site.xml
Property Name: HIVE_AUX_JARS_PATH
Property Value: /opt/local/hive/es/es.jar

I have also tried keeping the elasticsearch-hadoop-hive-7.9.3.jar in:
/opt/cloudera/parcels/CDH/lib/hive/lib/.

Below shows, that curl is able to resolve openshift service URL without any port:

curl http://<elasticsearch_service_in_openshift_URL>

{
  "name" : "es-cluster-1",
  "cluster_name" : "infra-efk",
  "cluster_uuid" : "gc6Fage-QH-3X6AFhEJzwg",
  "version" : {
    "number" : "7.9.3",
    "build_flavor" : "default",
    "build_type" : "docker",
    "build_hash" : "c4138e51121ef06a6404866cddc601906fe5c868",
    "build_date" : "2020-10-16T10:36:16.141335Z",
    "build_snapshot" : false,
    "lucene_version" : "8.6.2",
    "minimum_wire_compatibility_version" : "6.8.0",
    "minimum_index_compatibility_version" : "6.0.0-beta1"
  },
  "tagline" : "You Know, for Search"
}

Can someone help on this?

Hi @bob1985, thanks for your question.

While my familiarity with CDH is limited, two things I'm wondering about:

  1. How authentication to ES works in the case of your script? Shouldn't es.net.http.auth.user and es.net.http.auth.pass be set as well? I'm also not sure how your curl works with only http and without credentials. See the docs for more about getting those from ES.
  2. Can you try setting es.nodes.discovery to false as well?

Hi @dkow

Thank you for your reply.

Since I am able to reach 'http://<elasticsearch_service_in_openshift_URL>' without port '9200' and While 'es.port' = '9200' by default if we don't set, so Hive was trying to use that port.

I am able to resolve the issue by setting the port to 80.

'es.port' = '80'

Thank you.

Awesome you got it working! Thanks for the update.