The remote server returned an error: (429) Too Many Requests

I am creating new topic on this as the old was closed. I increased resources but still I getting this error at some peak hour i have +3000 clients indexing documents using NEST client and i see lot of connection fails with these errors.

Is there any config or pool size i have to increase in elasticsearch when i increase memory and cpu to avoid this?

# ServerError: ServerError: 429Type: es_rejected_execution_exception Reason: "rejected execution of org.elasticsearch.transport.TransportService$7@16399044 on EsThreadPoolExecutor[index, queue capacity = 200, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@49aa71ba[Running, pool size = 4, active threads = 4, queued tasks = 200, completed tasks = 1521828]]"
# OriginalException: System.Net.WebException: The remote server returned an error: (429) Too Many Requests.
and also see this

# OriginalException: System.Net.WebException: The remote server returned an error: (500) Internal Server Error.
  at System.Net.HttpWebRequest.EndGetResponse (IAsyncResult asyncResult) <0x40645d20 + 0x00197> in <filename unknown>:0 
  at System.Net.HttpWebRequest.GetResponse () <0x4063c200 + 0x00053> in <filename unknown>:0 
  at Elasticsearch.Net.HttpConnection.Request[TReturn] (Elasticsearch.Net.RequestData requestData) <0x4062a000 + 0x0029f> in <filename unknown>:0 
# Request:
<Request stream not captured or already read to completion by serializer. Set DisableDirectStreaming() on ConnectionSettings to force it to be set on the response.>
# Response:
<Response stream not captured or already read to completion by serializer. Set DisableDirectStreaming() on ConnectionSettings to force it to be set on the response.>

Hi,

AFAIK it's not possible and not recommended to increase pool sizes.
Does you cluster well sized for +3000 clients ? how many documents are they indexing in ?

I have one node with 4CPU and 16GB RAM. and sar history the system is usually idle at 90% and have memory:

[root@elastic01 ~]# free -m
              total        used        free      shared  buff/cache   available
Mem:          15887        3084         218          44       12584       12484
Swap:          2047         281        1766

I am catching errors and indexing. So i am seeing this error around 2000 times at a certain hour. Not sure how many docs are being indexed at that time, any query to find out? i dont have xpack.

Each client index aproximate 100-200 docs at a time.

I also get some errors that some docs are large

Invalid NEST response built from a unsuccessful low level call on PUT: /ueb-backups-2017.05/object/833-306-50123_FS1031_1495613153_98649
# Audit trail of this API call:
 - [1] BadResponse: Node: https://xxxx:443/ Took: 00:00:00.1639840
# OriginalException: System.Net.WebException: The remote server returned an error: (413) Request Entity Too Large.
  at System.Net.HttpWebRequest.EndGetResponse (IAsyncResult asyncResult) <0x400fbf60 + 0x00197> in <filename unknown>:0 
  at System.Net.HttpWebRequest.GetResponse () <0x400f24b0 + 0x00053> in <filename unknown>:0 
  at Elasticsearch.Net.HttpConnection.Request[TReturn] (Elasticsearch.Net.RequestData requestData) <0x400e02d0 + 0x0029f> in <filename unknown>:0 
# Request:
<Request stream not captured or already read to completion by serializer. Set DisableDirectStreaming() on ConnectionSettings to force it to be set on the response.>
# Response:
<Response stream not captured or already read to completion by serializer. Set DisableDirectStreaming() on ConnectionSettings to force it to be set on the response.>

Pool sizes are auto-scaled to the number of CPU, so if you want to index more, I would say : add machines in your cluster :slight_smile:

but the cpu is idle most of the time, i dont see any constrain in CPU, why do i need more nodes or cpu?

Also is it better to scale up or scale out? cant i just add more cpus to thee node?

Your errors are due to overload in the queue capacity.

es_rejected_execution_exception Reason: "rejected execution of org.elasticsearch.transport.TransportService$7@16399044 on EsThreadPoolExecutor[index, queue capacity = 200, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@49aa71ba[Running, pool size = 4, active threads = 4, queued tasks = 200, completed tasks = 1521828]]"

Here are explainations : https://www.elastic.co/guide/en/elasticsearch/reference/5.4/modules-threadpool.html

In a first time, try to add more CPU if you can, and monitor.

i have increased from 4 to 8CPU and from 16GB to 24GB, and will see tonight. Is there anyway to see if the pool size is now larger?

As i have the problem at a specific time when most appliances try to connect, will kafka or other mechanism help here?

You should see the "thread_pool" size when requesting the node infos.

@see https://www.elastic.co/guide/en/elasticsearch/reference/5.4/cat-thread-pool.html#_thread_pool_fields

hope this help

After increasing to 8CPU and 24GB it reduced the number of these errors from 40.000 to 5.000 :slight_smile: But still getting 5.000 errors, does it mean i need to increase more CPU?

Invalid NEST response built from a unsuccessful low level call on PUT: /ueb-metrics-2017.05/object/17cc979d-c1b5-4b48-bc01-cabdfabfc7e6_storage_2017.05.25_23.55
# Audit trail of this API call:
 - [1] BadResponse: Node: https://xxxxxx:443/ Took: 00:00:00.1782390
# ServerError: ServerError: 429Type: es_rejected_execution_exception Reason: "rejected execution of org.elasticsearch.transport.TransportService$7@517c7130 on EsThreadPoolExecutor[bulk, queue capacity = 200, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@46f81218[Running, pool size = 8, active threads = 8, queued tasks = 200, completed tasks = 992704]]"

It seems to be the good solution yes :slight_smile:
You have only 1 server ?

Yes i only one have node. For simplicity i wanted to scaleup and have 1node but this leds to my second question.

If i add a second node, how do i load balance all put requests from all the NEST clients (+3000) to the public address? I just have one public address.

Nodes are VM in storage arrays so storage is RAID5 so i dont need to have document replicas, is it fine just adding additional node with no replicas and the cluster will loadbalance and store some docs in one node and other in the other node?

This is not recommended, and the decreased write performance of RAID5 might further explain your 429 count. Our official storage recommendations are here.

Elasticsearch can do a lot on a single node, but it's real strength comes with its ability to act in a distributed way. Without replicas, you cannot expect to have a reliably distributed system. You'll need to add more nodes at some point to scale bigger than you have (as is evidenced by the 429s). You are also having a single point of failure with only one node. More nodes allows for searching and indexing to continue in the event that one node goes offline (perhaps for an upgrade, not necessarily an outage).

As far as load balancing goes, all Elasticsearch nodes are capable of redirecting traffic to the node where the data lies.

You would do well to read The Definitive Guide, even if it is tied to Elasticsearch 2.x right now, just to understand the principles of clustering Elasticsearch more fully.

But in cloud/hyperconverged enviroments where storage availability is not a problem because distributed also do you also recommend replicas? so you are duplicating the storage needs several times that is not very efficient.

If i just expose one node of the cluster on the public ip, will not that node be satured because is getting all the connections first and will hit some queue limit before redirecting to another node? This is about indexing new data from all NEST clients, i dont need to loadbalance queries because the number of queries to read data is very small.

I agree that it's not storage efficient, but there are different kinds of efficiency. Elasticsearch is a search engine, first and foremost. Adding extra replicas is not just for redundancy. It also improves search and read performance as there are multiple copies of the data to read from. This is one of the ways that a search engine can maintain millisecond-speed searches. If performance is not a concern for you in that way, then feel free to run without replicas.

That will depend on the traffic load, of course. If you're saturating the endpoint with both searches and index requests, then you should add more local nodes and use a load balancer (as you pointed out) to redirect the requests to the various nodes of the cluster. This is a very common use case.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.