Shard Count based on incomming data (MBits/sec) or max MBits/sec of Node (datastream included)?

LaszloE · January 10, 2024, 8:20pm

Let us say I have 300 GB data coming in from one client in a day.
I store this data on the hot nodes and at the end of the day I move it to the warm nodes.

For this 300 GB in order to have acceptable sized shards (let us say 37 GB) I will need to distribute it on 8 shards (nodes). 37 * 8 ~ 300.

But if I introduce datastreams, than I can do something like this:
Distribute the incoming data on 4 shards as shards reaches 40 GB move the "hidden index" (or hidden shards) to the warm nodes thus free the hot nodes.

At this time the problems becomes how many shards do I have to allocate for a given maximum MBits/sec

Let us say for my daily 300 GB index producing client the max MBits/sec is 15 Mbits/sec

Or I should grab the problem in a different way. Let say one node can handle 15 Mbits/sec plus other 10 smaller clients and I should be worried about query performance?

Is this problem should be grabbed more by query performance. But in this case the latest index which is always bellow 40 GB can queried easily.

Another way how do you measure the maximum ingestion capability of a node based on MBits/sec?

system · February 7, 2024, 8:21pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Shards based on specific nodes Elasticsearch ilm-index-lifecycle-management	1	435	February 17, 2020
Shard Configuration Elasticsearch ilm-index-lifecycle-management	2	219	August 19, 2022
Sharding hot vs warm Nodes Elasticsearch	8	2121	October 16, 2020
Trying to optimize Elasticsearch cluster Elasticsearch	3	964	February 20, 2017
Ideal data/ingester node count Elasticsearch	6	449	February 10, 2020

Shard Count based on incomming data (MBits/sec) or max MBits/sec of Node (datastream included)?

Related topics