My cluster has 3 master nodes and 6 data nodes. Each data node has 64GB Memory, 24 Cores and 80TB Disks.
I wonder know it is ok if my data nodes is full of disk (about 80TB/per node).
What about profermence when the disk get fulled with such data capacity?
As outlined in this blog post, each shard requires a certain amount of heap. Exactly how much depends on the data, mappings, number of segments as well as the size of the shard.
Exactly how much heap this shard related overhead can be allowed to use up depends on how much heap you need to have left over to handle the indexing and query load. Given that you will have a heap of just below 32GB in size, you are in my opinion unlikely to be able to use all that disk space.
When it comes to performance, this will depend a lot on your data and types of queries, but also how much of the data you are querying in each request. As Elasticsearch is often limited by disk I/O, the performance of your storage will also affect performance, especially when you have a lot of data per node.
As there are a large number of parameters that affect this, your best bet is to run some benchmarks to find out how your data and queries behave and how the storage volume affects performance.
This Elastic{ON} talk might also be useful.
If you can tell us a bit about your data and use-case, we may be able to give some more detailed guidance.
THANKS
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.