I have 4638 indices, 5 primary shards with 1 replicate each
So 4200 shards per data node. You need to reduce your shard count. By 1 - 2 orders of magnitude.
I'm assuming you don't have indices that each contain the same document type. If you do, merge them.
I have about 42 indices per day (varying sizes ranging from 3MB to 20GB).
First thing, you don't have a single index that needs 5 primary shards. Reconfigure so you have 1 primary
shard/index.A 20GB shard is no problem (particularly when the alternative is to have way too many shards.)
In a few months, you'll be down to 840 shards per data node, much faster if you use the _reindex
API. Still too many.
A 3MB daily index, if it's the same index that is 3MB every day - you could easily accommodate 20 years of that in a single shard. But lets settle for a month.
Let's work out a 3-tier strategy. Monthly, Weekly, Daily.
Identify the indices that accumulate less than, say, 1GB/month. Configure those to be monthly indices. For these, the choice to use _reindex
is a no brainer, it won't be very taxing. 1 shard, 1 replica.
Identify indices that accumulate 1GB - 5GB a month. Configure those to be weekly. 1 shard, 1 replica. _reindex
.
Leave the rest as daily, but still, 1 shard, 1 replica.
Now. All those replicas. Are you taking snapshots? If you aren't, start. Then you can start to consider dropping the replica on your old, read-only indices. With that many shards per node, you aren't gaining any read throughput by having replicas, you just need to reduce your risk of losing all copies of a shard, and 2 is better than 1, but snapshots will alleviate that concern.
Implementing these changes will make a spectacular difference in how your cluster performs, both indexing and searching.