I have a k8s cluster which as 3 ES nodes deployed & this es cluster is used by another pod
In the logs of my pod I see logs like
Node <Urllib3HttpNode(http://es.elastic-system.svc:9200)> has failed for 1 times in a row, putting on 1 second
Retrying request after non-successful status 429
But in the elasticsearch cluster there is no log related to any request getting rejected.
Even in the thread_pool API i see count as 0 rejected
_cat/thread_pool?v&s=t,n&h=type,name,node_name,active,queue,rejected,completed
Can someone please help me with the next steps to debug this?
Which version of Elasticsearch are you using?
What type of operation/request is resulting in the 429?
What is the size and specification of the cluster (CPU, RAM, type of storage used)?
What load is the cluster under? What is the use case?
As outlined in this old blog post (not sure how applicable it is in newer versions) 429s generally mean that you are overloading the cluster. This can be due to a spike in traffic or lack of resources.
We are using version 8.14.3
Cpu for each node is set to 12 & RAM is set to 22Gib
I'm not sure about operation at the moment.
It is possible that it might be due to cpu and memory load but the thread pool api show rejected count as 0. So i'm not sure if resource usage load is the definite reason
Can you provide some details around this?
What type of storage are you using? Elasticsearch can be very I/O intensive and storage performance is often the limiting factor rather than CPU or RAM.
Storage class is standard-rwo by gcp and it is having 50gb for each node
For the load, I used this API & got following output
{{baseUrl}}/_cat/nodes?v=true&s=cpu:desc&h=ram.current,ram.max,ram.percent,name,disk.used,cpu
I can derive following values
ram.current ram.max ram.percent name disk.used cpu
20.1gb 21.4gb 94 elasticsearch-2 290.5mb 7
20.1gb 21.4gb 94 elasticsearch-0 307.6mb 5
16gb 21.4gb 74 elasticsearch-1 167.2mb 5
I am not very familiar with GCP storage classes, but based on quick search that seems like very slow storage. If it is it could explain the poor performance.
I was more referring to indexing and search load. How many indexing and update operations do you perform per second (includes deletes)? How heavy is the search load?
How many indices and shards is you data distributed across? What is the average shard size?