Public SSL'ed access with Ingress not working

hey all, I have followed the quickstart guide from master branch of the docs, and all works perfectly well when i set the network type to LoadBalancer for kibana and elastic. I am able to curl the endpoints (with the self-signed cert). However, when I create an ingress resource, it appears as though all the backends fail the health checks, and the ingress refuses to route traffic to any of the pods.

Here are my related configs:

elastic.yaml

apiVersion: elasticsearch.k8s.elastic.co/v1alpha1
kind: Elasticsearch
metadata:
  name: quickstart
spec:
  version: 7.1.0
  http:
    service:
      spec:
        type: NodePort
        ports:
        - port: 9200
        - targetPort: 9200
        - protocol: TCP
    tls:
      selfSignedCertificate:
        subjectAltNames:
        - dns: myuniquedomain.ca
          ip: 34.98.124.3
  nodes:
  - nodeCount: 3
    config:
      node.master: true
      node.data: true
      node.ingest: true
    volumeClaimTemplates:
    - metadata:
        name: data
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 10Gi
        storageClassName: standard

elastic_ingress.yaml

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: myuniquedomain-ingress
  annotations: 
kubernetes.io/ingress.global-static-ip-name: myuniquedomain-static-ip
spec:
  rules:
  - http:
  paths:
  - path: /elastic
    backend:
      serviceName: quickstart-es
      servicePort: 9200 

However, the ingress shows all the backend as unhealthy.

broken

Attempting to curl with the new cert now returns:

╰─ curl --cacert ca.pem -u elastic:$PW https://myuniquedomain.ca/elastic
curl: (35) OpenSSL SSL_connect: SSL_ERROR_SYSCALL in connection to myuniquedomain.ca:443 

I have an A-record on the domain correctly pointing to the static IP as well.

Am I missing something obvious?

Thanks,

--Gary

Hey Gary,

Can you share which class of ingress you are using? I guess you are using your default cloud provider ingress: which one is it?

Generally, there are 3 ways to setup an ingress in front of Elasticsearch:

  1. Make the ingress forward TCP connections to Elasticsearch directly, so Elasticsearch can do the TLS termination. For example with the NGINX ingress, this can be done with SSL passthrough configuration.

  2. Terminate TLS at the ingress layer (with its own certificates), and make the ingress use HTTPS to reach Elasticsearch, which terminates TLS with its own certificates. For an example of how this can be achieved with the NGINX ingress, see this post. On GKE, it looks like an annotation allows it to be configured.

  3. Disable HTTPS at Elasticsearch level, so the ingress uses HTTP to contact Elasticsearch. This is not currently supported with ECK.

Depending on your ingress class, you'll probably be able to either do 1. or 2.

Please let us know :slight_smile:

Heya. Thanks for the response. I am using GCP's default Ingress class, which is their HTTP(S) Cloud Load Balancer. I will have a look at your links and report back.

Hello again!

I've figured out what was broken. It turns out, that the default readinessProbe for the kibana pod looks like this:

 kubectl get deployment quickstart-kibana -o jsonpath='{.spec.template.spec.containers[*].readinessProbe}'
Readiness:  http-get http://:5601/ delay=10s timeout=5s period=10s #success=1 #failure=3

Unfortunately, this returns a 302 to /login?next=%2f as evidenced by the pod logs:
podlogs

Under kubernetes, readinessProbes are allowed to return anything from 200 >= response_code < 400. Unfortunately, when creating an Ingress on GCP (the default Http Cloud Load Balancer), it automatically creates a GCP http health check against the readinessProbe (in this case /), which must return 200. See here

The solution to this just ended up being a fix to the health check itself in this case, since I'm unable to modify the readinessProbe of a running pod. In order to fix the associated health-check, I had to update the request path to point to the easiest thing that returns a 200, the login page:

First, find the health check in question.

 gcloud compute health-checks list
NAME                            PROTOCOL
k8s-be-32228--6ac986474e4810bf  HTTP

then update the http health check to point to the endpoint which actually returns 200:

  gcloud compute health-checks update http k8s-be-32228--6ac986474e4810bf --request-path=/login\?next=%2F

Then verify the health-check has been updated by checking the request path

 gcloud compute health-checks describe k8s-be-32228--6ac986474e4810bf
checkIntervalSec: 70
creationTimestamp: '2019-07-11T10:43:46.547-07:00'
description: Kubernetes L7 health check generated with readiness probe settings.
healthyThreshold: 1
httpHealthCheck:
  port: 32228
  proxyHeader: NONE
  requestPath: /login?next=%2F
id: '8253387266199309245'
kind: compute#healthCheck
name: k8s-be-32228--6ac986474e4810bf
selfLink: [redacted]
timeoutSec: 5
type: HTTP
unhealthyThreshold: 10

5 minutes after this, the health checks came alive and my site is now routing correctly using default GCP ingress.

It may be worthwhile to change the default readinessProbe requestPath for the kibana pods. I'm not sure how Azure/AWS handles health checks, but for GCP at least this operator doesn't work out of the box with ingress.

Hey @tadgh, thanks for investigating this!

I guess GCP LoadBalancers should probably adapt to k8s readiness checks convention at some point, and not the other way around?

Anyway, I think you're right we can still do something here. I just opened https://github.com/elastic/cloud-on-k8s/issues/1308 in our Github repository to track this.

Amazing! Thanks. Agreed, GCP health check doesn't make much sense to me, and should probably conform to the kubernetes-allowed response status codes (2xx/3xx). Either way, greatly appreciated :slight_smile:

Guys, I appreciate if you can give some advise. We have exactly same problem, but we need to expose elasticsearch itself in addition to Kibana. Is there appropriate endpoint in elasticsearch service that will return 200 ?

A GET on "/" should return 200 when an Elasticsearch node is healthy.

> curl --cacert tls.crt -u elastic:$pass https://$ip:9200/ -i
HTTP/1.1 200 OK
content-type: application/json; charset=UTF-8
content-length: 506

{
  "name" : "c1-es-wcglgs6nmk",
  "cluster_name" : "c1",
  "cluster_uuid" : "2ShivYvmSA-dK6n90Ih_wg",
  "version" : {
    "number" : "7.2.0",
    "build_flavor" : "default",
    "build_type" : "docker",
    "build_hash" : "04116c9",
    "build_date" : "2019-05-08T06:20:03.781729Z",
    "build_snapshot" : false,
    "lucene_version" : "8.0.0",
    "minimum_wire_compatibility_version" : "6.8.0",
    "minimum_index_compatibility_version" : "6.0.0-beta1"
  },
  "tagline" : "You Know, for Search"
}

Interesting, as we deployed basic installation and everything is exactly as described, it doesn't work only in case of using GKE ingress, but response if within GKE.

@tadgh I'm wondering, in your original question you have posted Service and Ingress details for elastic service, but then confirmed you modified Kibana health check and it worked. But did elastic service itself works for you now?

For GKE Ingress, i was only using it for kibana. For elasticsearch, I just had the type set to LoadBalancer, and forwarded DNS to the loadbalancer IP. Once I reworked the actual health-check that got automatically generated by the cloud https load balancer, ingress to kibana worked. I haven't yet tried an Ingress object directly to ES.