Gcs-repository-creation-issue for elasticsearch on gke

We are setting up an elasticsearch cluster on GKE with the following format:

  1. Master nodes as kubernetes deployments

  2. Client nodes as kubernetes deployments with HPA

  3. Data nodes as stateful sets with PVs

We are able to set up the cluster well. But then we are struggling in configuring the snapshot backup mechanism. Essentially, we are following this guide. We are able to follow this upto the step of getting the secret json key. Afterwards, we are not sure how to add this to the elasticsearch keystore and proceed further. We are really stuck on this for quite some and the documentation have not been great. All docs mention that add this json key to elasticsearch.keystore but we don't know how to do that. The json file is on our local shell while the keystore is on es pods. Also, we have created a custom dockerfile to install gcs plugin. Really looking for some help here.

Are you using Elastic Cloud on Kubernetes? If so, you can get secrets into the keystore from the Kubernetes secret store.

We aren't using elastic cloud on Kubernetes. We have created a Kubernetes cluster using Elasticsearch docker image.

The orchestration of an Elasticsearch cluster is not simple, and it's easy to get it wrong in ways that occasionally lose data. I recommend using the official operator rather than trying to develop your own orchestration.

If you insist on using your own images, you should use the elasticsearch-keystore command to add any secrets to the keystore.

One of the core reasons why we did not go ahead with the official version is that we were not sure if we can configure it correctly to operate at loads such as 100k RPS for both reads and writes.
It seems that we should really try this out. I would love to hear your suggestions about what sort of configuration should we have. In our existing ES cluster, we have about 500 GB of data and about 100k RPS. We are planning to use 8 vCPU 32 GB machines for our Kubernetes ES cluster so that we can have heap size of about 14-15 GB. Can you suggest some configuration tips / suggestions based on your experience with this operator. Also, how does the operator
take care of autoscaling, especially of data and client nodes here?

I do not think the orchestration mechanism should have any impact on performance. You should get the same cluster however it's orchestrated.

Benchmark your setup with a realistic workload. That's the only way you can truly validate its performance characteristics. Our public benchmarks show performance on some workloads in excess of 100k per second on a three-node benchmarking cluster, but performance is very dependent on your workload and hardware so you must perform your own experiments.

I don't think there's any auto-scaling yet. It doesn't seem necessary in such a small cluster.

I tried crating a cluster. But some of my pods are being OOM killed which is strange given that I am using 8vCPU 30 GB machines. Ideally, I would have expected this to work well given the resources I am using.
Below is my elasticsearch.yaml:

kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: local-storage
namespace: elastic-system
provisioner: kubernetes.io/no-provisioner
volumeBindingMode: WaitForFirstConsumer


apiVersion: elasticsearch.k8s.elastic.co/v1alpha1
kind: Elasticsearch
metadata:
name: es-cluster
namespace: elastic-system
spec:
version: 7.2.0
nodes:

3 dedicated master nodes

  • nodeCount: 3
    config:
    node.master: true
    node.data: false
    node.ingest: false
    cluster.remote.connect: false
    podTemplate:
    spec:
    affinity:
    podAntiAffinity:
    preferredDuringSchedulingIgnoredDuringExecution:
    - weight: 100
    podAffinityTerm:
    labelSelector:
    matchLabels:
    elasticsearch.k8s.elastic.co/cluster-name: es-cluster
    topologyKey: kubernetes.io/hostname
    nodeSelector:
    cloud.google.com/gke-nodepool: es-pool
    initContainers:
    - name: init-sysctl
    image: busybox:1.27.2
    command:
    - sysctl
    - -w
    - vm.max_map_count=2621441
    securityContext:
    privileged: true
    - name: install-plugins
    command:
    - sh
    - -c
    - |
    bin/elasticsearch-plugin install --batch repository-gcs
    containers:
    - name: elasticsearch
    env:
    - name: ES_JAVA_OPTS
    value: -Xms5g -Xmx5g
    - name: NSS_SDB_USE_CACHE
    value: "no"
    resources:
    requests:
    memory: 2Gi
    limits:
    cpu: 4
    memory: 5Gi

3 coordinating nodes

  • nodeCount: 3
    config:
    node.master: false
    node.data: false
    node.ingest: false
    podTemplate:
    spec:
    affinity:
    podAntiAffinity:
    preferredDuringSchedulingIgnoredDuringExecution:
    - weight: 100
    podAffinityTerm:
    labelSelector:
    matchLabels:
    elasticsearch.k8s.elastic.co/cluster-name: es-cluster
    topologyKey: kubernetes.io/hostname
    nodeSelector:
    cloud.google.com/gke-nodepool: es-pool
    initContainers:
    - name: init-sysctl
    image: busybox:1.27.2
    command:
    - sysctl
    - -w
    - vm.max_map_count=2621441
    securityContext:
    privileged: true
    - name: install-plugins
    command:
    - sh
    - -c
    - |
    bin/elasticsearch-plugin install --batch repository-gcs
    containers:
    - name: elasticsearch
    env:
    - name: ES_JAVA_OPTS
    value: -Xms7g -Xmx7g
    - name: NSS_SDB_USE_CACHE
    value: "no"
    resources:
    requests:
    memory: 5Gi
    limits:
    cpu: 4
    memory: 6Gi

4 data nodes

  • nodeCount: 4
    config:
    node.master: false
    node.data: true
    node.ingest: false
    podTemplate:
    spec:
    affinity:
    podAntiAffinity:
    preferredDuringSchedulingIgnoredDuringExecution:
    - weight: 100
    podAffinityTerm:
    labelSelector:
    matchLabels:
    elasticsearch.k8s.elastic.co/cluster-name: es-cluster
    topologyKey: kubernetes.io/hostname
    nodeSelector:
    cloud.google.com/gke-nodepool: es-pool
    initContainers:
    - name: init-sysctl
    image: busybox:1.27.2
    command:
    - sysctl
    - -w
    - vm.max_map_count=2621441
    securityContext:
    privileged: true
    - name: install-plugins
    command:
    - sh
    - -c
    - |
    bin/elasticsearch-plugin install --batch repository-gcs
    containers:
    - name: elasticsearch
    env:
    - name: ES_JAVA_OPTS
    value: -Xms5g -Xmx5g
    - name: NSS_SDB_USE_CACHE
    value: "no"
    resources:
    requests:
    memory: 5Gi
    limits:
    cpu: 4
    memory: 5Gi
  • volumeClaimTemplates:
    • metadata:
      name: elasticsearch-data
      spec:
      accessModes:
      • ReadWriteOnce
        resources:
        requests:
        storage: 200Gi
        storageClassName: local-storage
        updateStrategy:
        changeBudget:
        maxSurge: 1
        maxUnavailable: 0
        secureSettings:
        secretName: gcs-credentials
        http:
        service:
        spec:
        type: ClusterIP

Please use the </> button to format any YAML you are sharing properly. YAML is whitespace-sensitive and if you don't format it properly then it's quite meaningless.

You should read the guide to setting the heap size, in particular:

Set Xmx and Xms to no more than 50% of your physical RAM. Elasticsearch requires memory for purposes other than the JVM heap and it is important to leave space for this...

Here "physical RAM" means "RAM allocated to the container". Your coordinating nodes have a 7GB heap on a 6GB container which is completely hopeless, and the other containers have heap size equal to container RAM which is still off by a factor of 2.

Sorry for the bad format. But thanks a lot. The cluster is up and running. I am going ahead with configuring snapshots now.

1 Like

I have updated it and the cluster is running fine. I have created a clusterIP service:

NAME                              TYPE        CLUSTER-IP    EXTERNAL-IP   PORT(S)    AGE
service/elastic-webhook-service   ClusterIP   10.64.4.214   <none>        443/TCP    4h10m
service/es-cluster-es-http        ClusterIP   10.64.7.198   <none>        9200/TCP   163m

But when I ssh into a node of the Kubernetes cluster (in another node pool which is not running the ES cluster) and I try curl commands, I get no reply from server.
curl: (52) Empty reply from server

Any idea what is happening? Not sure if my cluster is running. How do I create indexes and insert data?
curl -X GET "10.64.7.198:9200/_cluster/health?pretty":
curl: (52) Empty reply from server

kubernetes -n elastic-system get elasticsearch:

NAME         HEALTH   NODES   VERSION   PHASE         AGE
es-cluster   green    10      7.2.0     Operational   2h

That sounds like possibly a network config issue, but I'm not the best person to ask about this. I've moved this post over to the ECK forum and hopefully someone else can help with the details here.

Hello, please see the docs here:
https://www.elastic.co/guide/en/cloud-on-k8s/current/k8s-accessing-elastic-services.html
for how to access it. In your case it looks like you will need to specify https and authenticate to connect to the ES cluster. If you run into any issues please let us know, we try to document the common ones people run into.

1 Like

Yeah. Got it working. Thanks a lot. This entire thread has been super useful. My cluster is up and running. I just have one concern here that there is no auto scaling. In case my pods are reaching CPU limits, I don't have something like a Kubernetes Horizontal Pod Autoscaler to automatically schedule more pods. I will have to monitor the cluster myself for each of the possible bottlenecks and then scale my cluster manually.