Hello,
I am storing events from different sources, and the data is continuous and reaches huge sizes. I want to implement an effective strategy for data retention.
I want to keep the last 14 days of data on SSD storage so that search performance is very fast.
I have an SSD server with a capacity of 75 TB, and I also have HDD storage with the same capacity, because the data is huge and is not deleted until after a considerable period.
To make the search faster, I came up with the following strategy:
SSD storage
Divide the SSD server into five nodes, each with the following specifications:
- 64 GB RAM per node
- 28 CPU cores per node
- 15 TB SSD per node
This will be used to store the last 14 days of data for fast searches, and these nodes will serve as “hot” nodes.
HDD storage
I am thinking of dividing it as follows:
Five nodes, each with the following specifications:
- 64 GB RAM
- 32 CPU cores
- 15 TB HDD
This will be used to store older data (more than 14 days) and these nodes will serve as “warm” nodes.
Master nodes
I will also create 3 dedicated master nodes with the following specifications:
- 32 GB RAM
- 16 CPU cores
- 1 TB SSD
Note: In this strategy, each node will have a separate disk for the operating system, apart from the data disk.
Shard allocation
Now for shard and replica allocation:
I will create 3 shards for each index, each shard being 40 GB in size, and each shard will have 1 replica.
Per day, the data can reach up to 2 TB, so I create daily indices. The size per index will be about 120 GB, since we have 3 shards of 40 GB each.
I would like some help to make sure that this strategy is good for achieving very fast search performance.
- Is this node division correct, or are there any suggestions?
- Is this shard division optimal for achieving very fast search?