Logstash getting 429 error and data processing getting delayed

We are using Elasticsearch 6.8.2 version. We are manging Elasticsearch cluster through kubernetes. We have 3 master nodes and 25 data nodes. Kubernetes cluster is being setup in bare metal AWS EC2 instance. r4.4xlarge(16 vCPU, Memory 122 GiB) instance type is being used for k8s cluster.
Basically we are using Elasticsearch as centralized log store. We have more than 500+ application writing logs to Elasticsearch cluster. The log setup is as below. Application is writing log asynchronously to RabbitMQ. From RabbitMQ log is pulled by logstash and persist in Elasticsearch. For 500+ application there are 100+ indexes which are updated while persisting the log data.Each index is created monthly (Index name format -logs-YYYY.MM).Older indices are deleted by a scheduled job every 3 months. Indices are created automatically every month by logstash using default logstash template. Each index has 5 shards and 1 replica per shard. Logstash is getting 429 error most of the time and log data persistence is getting delayed.We need advise on how to optimize our cluster for faster write or processing of log data and to minimize the occurences of 429 error.

I would recommend reading rather old blog post, which I believe is stuill relevant to the problem you are seeing. Then have a look at your Logstash config and check whether flows for different types of logs are separated or not. Ideally you want each pipeline to only send data to a few of indices at most. Another way to fight the issue would be to consolidate indices as you have a very large number of them actively being written to at any point in time.

Thank you very much for your suggestion. We have already gone through the suggested blogpost. In our case one logstash pipeline writing log data. Logstash is writing logs to different index based on the application group. As applications are from different products are writing logs it is difficult to consolidate the indices or limiting the write to few indices.

You could define multiple pipelines in Logstash to each handle output of data to a small number of indices and then distribute data using pipeline-to-pipeline communication. That would likely increase the number of diocuments written to each shard as part of each batch.

Apart from this I would recommend reducing the number of active indices and have each instead cover a shorter time period in order to limit size.

@Christian_Dahlqvist Already indices are created monthly. Do we need to further reduce the time period? May be we will try this option instead of monthly index we can create per week.

If you combine indices in order to reduce the number of indices written to you may need to adjust the time period covered if they get too large, e.g. over 50GB in size. It all depends on how much data you are indexing daily.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.