We are experiencing 429 error during bulk save in our prod environment. The problem is intermittent and occurs only once a week, making it hard to reproduce in our test environment.
We have the following configuration in our cluster:
Data nodes: 2
Instance type: m6g.2xlarge.search
EBS volume size (GiB): 50
Additional information is as follows:
We have 13 indexes and settings per index includes 5 shards and 1 replica.
Max doc count of 327,198 and divided to 65,000 doc count per shard
Max doc size of 1.72GB and divided to 172MB per shard
The smallest doc count we have in one of our shards is 2 with a doc size of 24KB.
Our data grows gradually each day.
Current max heap per node is 16GB and used heap are 5GB and 2GB.
I am not sure if we are using too many shards for our data. Is reducing our shards to 4 or 2 the best solution? Or do we need to request the addition of 2 more nodes to our cluster. We are also saving 250 docs at a time per index. I am considering setting it to 100 or 200 to check if the indexing will improve.
It looks like you might be using Opensearch which is not supported here. That said it seems like you do not have enough IOPS to support the bulk indexing at peak times. You also have far too many primary shards for your data. A single primary shard per index would be much more appropriate and likely perform a lot better. It could also have a positive impact on the indexing issue.
In our cluster, we are not using Opensearch, the Elasticsearch version is 7.10. Also just to add, we are using Elasticsearch client 7.9.3. We haven't upgraded it yet, so I am not sure if there will be any impact to that.
Will try to update our primary shard to 1. Thanks for the suggestion, very much appreciated.
OpenSearch/OpenDistro are AWS run products and differ from the original Elasticsearch and Kibana products that Elastic builds and maintains. You may need to contact them directly for further assistance. See What is OpenSearch and the OpenSearch Dashboard? | Elastic for more details.
(This is an automated response from your friendly Elastic bot. Please report this post if you have any suggestions or concerns )
I would recommend that you upgrade to the latest version as your version is very old and has been EOL a long time. For this particiulat issue I would not expect an upgrade to have a major impact though. Changing IOPS and the numver of primary shards is likely to have bigger impact.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.