It seems difficult to set up an ECK ElasticSearch cluster where client sniffing works. Does anybody have any suggestions on best approaches here? Our clusters are currently on AWS EKS in 3 availability zones behind a single load balancer with cross-zone load balancing enabled. We seem to suffer from slow detection when nodes fail and I don't think there's any retry on connection failure (can't recall if that's provided by any of the clients anyway? We use the Python client FWIW).
Hi,
There is an ongoing issue to fully support client sniffing with eck: https://github.com/elastic/cloud-on-k8s/issues/3182
Could you share your Elasticsearch manifest and the client configuration (loadbalancer used, sniffer_timeout, tls settings...) ?
My question is similar to the original poster's, what is the best practice to connect to ES on kubernetes with rest clients?
- We are running Elasticsearch version 7.4.2 on kubernetes, AWS EKS, deployed via the elasticsearch
helm chart, version: 8.0.0-SNAPSHOT, sources: https://github.com/elastic/elasticsearch - We're using the Java RestHighLevelClient to query ES.
- Our ES cluster has 3 data nodes and 1 dedicated master in dev. 3 dedicated masters in prod
Our current approach is:
- A dedicated rest client for sniffing, this always sniffs through the load balancer
- E.g. hit http://elasticsearch.io:9200/_nodes/http. save the data node "publish_address" for the nodes.
- example /_nodes response:
jyLP4JCyQnuq4BvvnSysSA": {
"name": "elasticsearch-data-1",
"transport_address": "10.20.3.140:9300",
"host": "10.20.3.140",
"ip": "10.20.3.140",
"version": "7.4.2",
"build_flavor": "default",
"build_type": "docker",
"build_hash": "2f90bbf7b93631e52bafb59b3b049cb44ec25e96",
"roles": [
"ingest",
"data"
],
"attributes": {
"xpack.installed": "true"
},
"http": {
"bound_address": [
"0.0.0.0:9200"
],
"publish_address": "10.20.3.140:9200",
"max_content_length_in_bytes": 104857600
}
},
- the load balancer is a classic lb, and it points only to the data nodes.
- Use the Sniffer with our search and indexing rest clients
- set the search and indexing rest clients NodeSelector to SKIP_DEDICATED_MASTERS
- Sniff interval: 4 min
- Sniff on fail delay: 1 min
Thus, every 4 min the sniffer "sniffs" through the load balancer and sets the search or index client's nodes to the es-data pod IP addresses. If a connection to one of the pod ip addresses fails, then it will sniff for new nodes 1 minute later. If the search or index clients have a failing request, they will retry it on the other nodes.
This ~should~ keep us covered during rolling deploys or a full cluster outage. However; is there a different way that is "best practice"? Any comments / concerns with the approach we have now?
Any experience the community can share would be greatly appreciated.