Thank you very much for the prompt response.
Let me clarify our case further. We have, say, "main" or "important" indices. These are varied somewhere between 300GB-900GB in size and they all have 30*2 shards. We also have some very small (and not so important) indices with just a few shards and (some of them) with no replica. They are small in size, in the number of documents and we normally do not search them at all. I just did not realize how many of these indices we have and how they could skew the real picture. I took the total number of indices (335) from cerebro without giving it much thought. My apologies for that. From now on, we will talk about important indices only.
You are absolutely correct that we try to maintain even distribution of data across all nodes. We also want to make sure we can accommodate a huge spike in the number of generated logs and increase the indexing rate as necessary without hitting the back pressure. We deliver logs to Elasticsearch through kafka and if there is a kafka lag for whatever reason, we want to close the lag as fast as possible. Just few days ago we reached an indexing rate of 40K+/sec in a single index when it happened. That's why we thought it would be best to have as many primary shards as there are data nodes in the cluster.
When the indices get old and are not being written to anymore , we want to optimize them for search.
The idea is to move them to an idle node and shrink them there one by one. Naturally, we want to keep that idle node as busy as possible during the shrink, so the question is what settings we could change to achieve that. Right now the shrink takes to much time and the node seems to be under utilised during this process.
I found some tips about changing "indices.store.throttle.max_bytes_per_sec", "indices.store.throttle.type" , index.merge.scheduler.max_thread_count settings but they seem to apply to 1.x / 2.x versions only.
I am wondering if there are similar settings in 5.x.
Thank you very much for all of your help!