We are setting up an elasticsearch cluster on GKE with the following format:
Master nodes as kubernetes deployments
Client nodes as kubernetes deployments with HPA
Data nodes as stateful sets with PVs
We are able to set up the cluster well. But then we are struggling in configuring the snapshot backup mechanism. Essentially, we are following this guide. We are able to follow this upto the step of getting the secret json key. Afterwards, we are not sure how to add this to the elasticsearch keystore and proceed further. We are really stuck on this for quite some and the documentation have not been great. All docs mention that add this json key to elasticsearch.keystore but we don't know how to do that. The json file is on our local shell while the keystore is on es pods. Also, we have created a custom dockerfile to install gcs plugin. Really looking for some help here.
The orchestration of an Elasticsearch cluster is not simple, and it's easy to get it wrong in ways that occasionally lose data. I recommend using the official operator rather than trying to develop your own orchestration.
One of the core reasons why we did not go ahead with the official version is that we were not sure if we can configure it correctly to operate at loads such as 100k RPS for both reads and writes.
It seems that we should really try this out. I would love to hear your suggestions about what sort of configuration should we have. In our existing ES cluster, we have about 500 GB of data and about 100k RPS. We are planning to use 8 vCPU 32 GB machines for our Kubernetes ES cluster so that we can have heap size of about 14-15 GB. Can you suggest some configuration tips / suggestions based on your experience with this operator. Also, how does the operator
take care of autoscaling, especially of data and client nodes here?
I do not think the orchestration mechanism should have any impact on performance. You should get the same cluster however it's orchestrated.
Benchmark your setup with a realistic workload. That's the only way you can truly validate its performance characteristics. Our public benchmarks show performance on some workloads in excess of 100k per second on a three-node benchmarking cluster, but performance is very dependent on your workload and hardware so you must perform your own experiments.
I don't think there's any auto-scaling yet. It doesn't seem necessary in such a small cluster.
I tried crating a cluster. But some of my pods are being OOM killed which is strange given that I am using 8vCPU 30 GB machines. Ideally, I would have expected this to work well given the resources I am using.
Below is my elasticsearch.yaml:
Please use the </> button to format any YAML you are sharing properly. YAML is whitespace-sensitive and if you don't format it properly then it's quite meaningless.
Set Xmx and Xms to no more than 50% of your physical RAM. Elasticsearch requires memory for purposes other than the JVM heap and it is important to leave space for this...
Here "physical RAM" means "RAM allocated to the container". Your coordinating nodes have a 7GB heap on a 6GB container which is completely hopeless, and the other containers have heap size equal to container RAM which is still off by a factor of 2.
I have updated it and the cluster is running fine. I have created a clusterIP service:
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/elastic-webhook-service ClusterIP 10.64.4.214 <none> 443/TCP 4h10m
service/es-cluster-es-http ClusterIP 10.64.7.198 <none> 9200/TCP 163m
But when I ssh into a node of the Kubernetes cluster (in another node pool which is not running the ES cluster) and I try curl commands, I get no reply from server.
curl: (52) Empty reply from server
Any idea what is happening? Not sure if my cluster is running. How do I create indexes and insert data?
curl -X GET "10.64.7.198:9200/_cluster/health?pretty":
curl: (52) Empty reply from server
kubernetes -n elastic-system get elasticsearch:
NAME HEALTH NODES VERSION PHASE AGE
es-cluster green 10 7.2.0 Operational 2h
That sounds like possibly a network config issue, but I'm not the best person to ask about this. I've moved this post over to the ECK forum and hopefully someone else can help with the details here.
Yeah. Got it working. Thanks a lot. This entire thread has been super useful. My cluster is up and running. I just have one concern here that there is no auto scaling. In case my pods are reaching CPU limits, I don't have something like a Kubernetes Horizontal Pod Autoscaler to automatically schedule more pods. I will have to monitor the cluster myself for each of the possible bottlenecks and then scale my cluster manually.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.