Facing Rate limiting issue in my elastic search cluster

Rajat_Jainwal · October 8, 2024, 11:01am

I have a k8s cluster which as 3 ES nodes deployed & this es cluster is used by another pod

In the logs of my pod I see logs like

Node <Urllib3HttpNode(http://es.elastic-system.svc:9200)> has failed for 1 times in a row, putting on 1 second

Retrying request after non-successful status 429

But in the elasticsearch cluster there is no log related to any request getting rejected.
Even in the thread_pool API i see count as 0 rejected

_cat/thread_pool?v&s=t,n&h=type,name,node_name,active,queue,rejected,completed

Can someone please help me with the next steps to debug this?

Christian_Dahlqvist · October 8, 2024, 11:33am

Which version of Elasticsearch are you using?

What type of operation/request is resulting in the 429?

What is the size and specification of the cluster (CPU, RAM, type of storage used)?

What load is the cluster under? What is the use case?

As outlined in this old blog post (not sure how applicable it is in newer versions) 429s generally mean that you are overloading the cluster. This can be due to a spike in traffic or lack of resources.

Rajat_Jainwal · October 8, 2024, 11:53am

We are using version 8.14.3

Cpu for each node is set to 12 & RAM is set to 22Gib

I'm not sure about operation at the moment.

It is possible that it might be due to cpu and memory load but the thread pool api show rejected count as 0. So i'm not sure if resource usage load is the definite reason

Christian_Dahlqvist · October 8, 2024, 12:06pm

Can you provide some details around this?

What type of storage are you using? Elasticsearch can be very I/O intensive and storage performance is often the limiting factor rather than CPU or RAM.

Rajat_Jainwal · October 8, 2024, 12:36pm

Storage class is standard-rwo by gcp and it is having 50gb for each node

For the load, I used this API & got following output

{{baseUrl}}/_cat/nodes?v=true&s=cpu:desc&h=ram.current,ram.max,ram.percent,name,disk.used,cpu

I can derive following values
ram.current ram.max ram.percent name disk.used cpu
20.1gb 21.4gb 94 elasticsearch-2 290.5mb 7
20.1gb 21.4gb 94 elasticsearch-0 307.6mb 5
16gb 21.4gb 74 elasticsearch-1 167.2mb 5

Christian_Dahlqvist · October 8, 2024, 12:53pm

I am not very familiar with GCP storage classes, but based on quick search that seems like very slow storage. If it is it could explain the poor performance.

I was more referring to indexing and search load. How many indexing and update operations do you perform per second (includes deletes)? How heavy is the search load?

How many indices and shards is you data distributed across? What is the average shard size?

Topic		Replies	Views
In which case ElasticSearch will return 429? Elasticsearch	4	47848	July 6, 2017
Response code 429 Logstash	9	9538	July 6, 2017
Elasticsearch cluster overloaded Elasticsearch	2	2710	March 7, 2018
Response code: 429 Logstash	10	10572	January 6, 2017
The remote server returned an error: (429) Too Many Requests Elasticsearch	14	41144	June 23, 2017

Facing Rate limiting issue in my elastic search cluster

Related topics