I would like to ask how to organize data in cluster so that optimal performance and stability could be achieved?
My use case:
Every day cluster is ingesting around 10GB of documents. Now it contains around 3TB of data(measured disk usage). Later on metrics are calculated on those documents using scripted metric aggregations. Cluster is queried quite rarely lets just say that is has around 1000 requests per day. As time goes by stored data size increases and eventually I will have to delete some of that data.
Stored data belongs to clients that have subclients. In cluster indices represent clients and document types represent subclients. All documents have the same template. So indices are not time based. If I would like to delete data from cluster this would be very inefficient so I considered continuing to index data to time base indices (maybe weekly or monthly). Here comes another problem. Currently cluster has around 700 indices so there are 3500(5 primary shards) primary shards and 3500(1 replica shard) replica shards. Problem is that I don't know how cluster will be impacted as shard count will increase. I have read that many shards is bad for cluster.
version used: 2.2
node count: 8
master/data nodes: 3
data nodes: 5
primary shards: 5
replica shards: 1
Any advice on this situation or general cluster setup is highly appreciated. Feel free to ask any additional information.