Let us say I have 300 GB data coming in from one client in a day.
I store this data on the hot nodes and at the end of the day I move it to the warm nodes.
For this 300 GB in order to have acceptable sized shards (let us say 37 GB) I will need to distribute it on 8 shards (nodes). 37 * 8 ~ 300.
But if I introduce datastreams, than I can do something like this:
Distribute the incoming data on 4 shards as shards reaches 40 GB move the "hidden index" (or hidden shards) to the warm nodes thus free the hot nodes.
At this time the problems becomes how many shards do I have to allocate for a given maximum MBits/sec
Let us say for my daily 300 GB index producing client the max MBits/sec is 15 Mbits/sec
Or I should grab the problem in a different way. Let say one node can handle 15 Mbits/sec plus other 10 smaller clients and I should be worried about query performance?
Is this problem should be grabbed more by query performance. But in this case the latest index which is always bellow 40 GB can queried easily.
Another way how do you measure the maximum ingestion capability of a node based on MBits/sec?