ECK Operator unable to verify basic license in OpenShift


I have the ECK Operator (2.1) running in OpenShift. Both Elasticsearch and Kibana are up and running properly with version 7.17.2, but the operator is stuck on trying to verify the (basic) license for Elasticsearch. I get the message "Could not verify license, re-queuing: Elasticsearch client failed for [...] connect: connection timed out." Am I missing a setting somewhere or something?

1 Like

What's the health of your Elasticsearch cluster?

It should mean that Elasticsearch is not accessible and therefore the license cannot be verified.

It's harmless to get this message during startup, but if it persists, there's something wrong with your Elasticsearch cluster.

Yeah, the weird part is the cluster is happy as a clam. From a GET _cluster/health call in Kibana:

  "cluster_name" : "elastic",
  "status" : "green",
  "timed_out" : false,
  "number_of_nodes" : 3,
  "number_of_data_nodes" : 3,
  "active_primary_shards" : 23,
  "active_shards" : 46,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 0,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 100.0

All of the indices show as green, as well. I am seeing occasional messages in the elastic pods about plain text requests over a secure channel and an empty client certificate chain, so maybe the operator isn't passing the certificate correctly? I had thought the operator would handle all of that internally, but maybe I overlooked something.

I experience the exactly same issue. Do you have any progress on that? When I curl the license endpoint from within an elastic pod I receive a valid response (unauthorized, but no timeouts...).

I think that this error is also a reason why I am not able to apply enterprise trial license (according to this documentation: Manage licenses in ECK | Elastic Cloud on Kubernetes [2.2] | Elastic).

No, unfortunately I haven't found a solution yet. I'm still getting the same error.

Could you try the same from the operator Pod? (elastic-system/elastic-operator-0 when deployed with, I can't remember the name when deployed using OLM though)

I am seeing occasional messages in the elastic pods about plain text requests over a secure channel and an empty client certificate chain, so maybe the operator isn't passing the certificate correctly?

The operator automatically setups the connection to Elasticsearch, including the TLS settings, I think it is unlikely that these logs are generated by the operator.

A few questions:

  • Is there any network policy in place?
  • Did you change the selector used in the http.service.spec field? (or any other field in the http section, including http.tls)
  • Could you share the Elasticsearch resource specification?

I don't have access to the elastic operator pod in typical circumstances. I'll try to get the attention of the team that handles it.

We do have fairly strict firewall policies, but they've never stopped anything within a project before. And the "ElasticsearchIsReachable" status passes.

I have not changed the http.service.spec field. I did try changing the tls.certificate value to see if I could manually select the correct certificate, but it didn't work and I put it back.

Here's the spec:

kind: Elasticsearch
  annotations: '{"no_transient_settings":false}' 'elastic-es-elastic-0,elastic-es-elastic-1,elastic-es-elastic-2'
  name: elastic
    app: elastic
      metadata: {}
      spec: {}
      certificate: {}
    logs: {}
    metrics: {}
    - config:
        node.attr.attr_name: attr_value
          - master
          - data false
      count: 3
      name: elastic
          creationTimestamp: null
            - name: elasticsearch
                  cpu: 2
                  memory: 1000Mi
                  cpu: 1
                  memory: 1000Mi
      metadata: {}
      spec: {}
      certificate: {}
    changeBudget: {}
  version: 7.17.2
  volumeClaimDeletePolicy: DeleteOnScaledownOnly
  availableNodes: 3
    - lastTransitionTime: '2022-04-25T15:34:14Z'
      message: 'Could not verify license, re-queuing'
      status: 'False'
      type: ReconciliationComplete
    - lastTransitionTime: '2022-04-25T15:30:58Z'
      message: All nodes are running version 7.17.2
      status: 'True'
      type: RunningDesiredVersion
    - lastTransitionTime: '2022-04-25T15:32:04Z'
      message: Service [...]/elastic-es-internal-http has endpoints
      status: 'True'
      type: ElasticsearchIsReachable
  health: unknown
      lastUpdatedTime: '2022-04-07T19:01:50Z'
      nodes: []
      lastUpdatedTime: '2022-04-07T19:01:50Z'
      nodes: []
      lastUpdatedTime: '2022-04-25T15:30:58Z'
      nodes: []
  observedGeneration: 15
  phase: ApplyingChanges
  version: 7.17.2

ElasticsearchIsReachable means that some endpoints are available to connect to Elasticsearch (at least one Pod in the cluster is Ready to receive connections). It does not mean that the operator has successfully used them to connect to the cluster.

 health: unknown

The operator does not seem to be able to get the cluster health. This probably has the same root cause as the license, could you double-check the connectivity between the cluster and the operator?