ECK clusters and client sniffing

Presence · August 14, 2020, 8:14am

It seems difficult to set up an ECK ElasticSearch cluster where client sniffing works. Does anybody have any suggestions on best approaches here? Our clusters are currently on AWS EKS in 3 availability zones behind a single load balancer with cross-zone load balancing enabled. We seem to suffer from slow detection when nodes fail and I don't think there's any retry on connection failure (can't recall if that's provided by any of the clients anyway? We use the Python client FWIW).

michael.morello · August 14, 2020, 2:33pm

Hi,

There is an ongoing issue to fully support client sniffing with eck: https://github.com/elastic/cloud-on-k8s/issues/3182

Could you share your Elasticsearch manifest and the client configuration (loadbalancer used, sniffer_timeout, tls settings...) ?

jgilly · September 7, 2021, 8:00pm

My question is similar to the original poster's, what is the best practice to connect to ES on kubernetes with rest clients?

We are running Elasticsearch version 7.4.2 on kubernetes, AWS EKS, deployed via the elasticsearch
helm chart, version: 8.0.0-SNAPSHOT, sources: https://github.com/elastic/elasticsearch
We're using the Java RestHighLevelClient to query ES.
Our ES cluster has 3 data nodes and 1 dedicated master in dev. 3 dedicated masters in prod

Our current approach is:

A dedicated rest client for sniffing, this always sniffs through the load balancer
- E.g. hit http://elasticsearch.io:9200/_nodes/http. save the data node "publish_address" for the nodes.
- example /_nodes response:

jyLP4JCyQnuq4BvvnSysSA": {
"name": "elasticsearch-data-1",
"transport_address": "10.20.3.140:9300",
"host": "10.20.3.140",
"ip": "10.20.3.140",
"version": "7.4.2",
"build_flavor": "default",
"build_type": "docker",
"build_hash": "2f90bbf7b93631e52bafb59b3b049cb44ec25e96",
"roles": [
"ingest",
"data"
],
"attributes": {
"xpack.installed": "true"
},
"http": {
"bound_address": [
"0.0.0.0:9200"
],
"publish_address": "10.20.3.140:9200",
"max_content_length_in_bytes": 104857600
}
},

the load balancer is a classic lb, and it points only to the data nodes.
Use the Sniffer with our search and indexing rest clients
set the search and indexing rest clients NodeSelector to SKIP_DEDICATED_MASTERS
Sniff interval: 4 min
Sniff on fail delay: 1 min

Thus, every 4 min the sniffer "sniffs" through the load balancer and sets the search or index client's nodes to the es-data pod IP addresses. If a connection to one of the pod ip addresses fails, then it will sniff for new nodes 1 minute later. If the search or index clients have a failing request, they will retry it on the other nodes.

This ~should~ keep us covered during rolling deploys or a full cluster outage. However; is there a different way that is "best practice"? Any comments / concerns with the approach we have now?

Any experience the community can share would be greatly appreciated.

Topic		Replies	Views
Java RestClient multi host configuration to access ES on Kubernetes Elastic Cloud on Kubernetes (ECK)	1	342	March 7, 2023
ECK Cluster Freezes when network fails for one node Elastic Cloud on Kubernetes (ECK)	2	283	November 4, 2022
Elasticsearch 8.5.3 ECK Services Elastic Cloud on Kubernetes (ECK) docker	1	301	January 11, 2023
How to setup client node and Loadbalancer in ECK setup? Elastic Cloud on Kubernetes (ECK)	4	503	December 30, 2021
Elastic Cloud on Kubernates Elastic Cloud on Kubernetes (ECK) elastic-stack-monitoring	2	319	June 18, 2023

ECK clusters and client sniffing

Related topics