Retrying failed action with response code: 429


(Jakov Sosic) #1

Hi guys,

I get a lot of the messages in logstash log that look like:

retrying failed action with response code: 429

I'm using redis as input, and elasticsearch as output and my logstash.conf is pretty straightforward:

input { redis { host => "127.0.0.1" data_type => "list" key => "logstash" codec => json threads => 16 } }

output {
elasticsearch {
hosts => "127.0.0.1:9200"
index => "logstash-%{app}-%{+YYYY.MM.dd}"
codec => "plain"
workers => 16
flush_size => 1000
idle_flush_time => 1
}
}

Any ideas what could be the cause for these errors?

ElasticSearch has 16GB of RAM available out of 32 on the system, IO is pretty small, cpu load on VM is also pretty low (<0.5)...

Also, I'm not seeing all the logs I'm expecting to see in the kibana, and don't know if this could be the cause for that, or maybe I have issues on the agent side.

I'm running these versions:

ElasticSearch 1.5.2
LogStash: 2.0.0 (was running 1.5.x with same problems)

How can I debug this further?


Logstash 抛错 code: 429
(Aaron Mildenstein) #2

I would remove these 4 lines and see what happens with the defaults.

16 workers is generally far more than necessary. I wouldn't make this setting more than 2 unless you're doing over 10,000 events per second. Flush size is also too big. Because of the retry logic (which is why you get 429 response codes), you should probably work in smaller batches (I believe the default is 512 now). The plain codec simply doesn't do anything here, as elasticsearch requires JSON.

I also note that you are separating your indices by app. How many "apps" do you have per day? How many indices do you have, total, on your cluster? What's your data retention policy? Are you using the default 5+1 shard count? I ask these questions because having too many shards on a single node can overload a node's index management ability. It only gets to use a percentage of the heap for this, and exhausting the memory creates pressure which can dramatically affect index caching (which might be what's resulting in more 429s).


(Jakov Sosic) #3

I have around 720 shards. I keep number_of_shards: 5 and number_of_replicas at 0. I have only 1 node in my ES cluster.

I've removed those lines and so far I don't see anything in logstash logs... I dropped all of my indexes and am now reindexing logs...


(Aaron Mildenstein) #4

720 shards is quite a few for a single node, as is 5 shards per index on a single node. For a single node, I would suggest that you only need 1 shard, maybe 2.

How many "apps" do you have per day (i.e., how many indices are created each day)? This number will have a profound impact as it dictates the number of "active" shards, which want more of the index cache.


(Jakov Sosic) #5

OK, I lowered it to 1 shard per indice and I'm down to 51 shards. I keep logs for 10 days, create indices daily and have approximately 5-6 apps.


(Aaron Mildenstein) #6

Those numbers are much more tenable. You shouldn't exhaust your index cache with shard counts like those.


(system) #7