Hello
can you help
In order to store big data, what is the best type of HDD storage that is fast
For example
Qnap NAS is there a faster type than it?
Hello
can you help
In order to store big data, what is the best type of HDD storage that is fast
For example
Qnap NAS is there a faster type than it?
What is fast enough for your use case?
Local SAS RAID-6 was fast enough for ours. 6 hot nodes, 2 cold. It’s for logs.
If you want 'fast' storage for Elasticsearch you, as described in other threads, typically need SSD backed storage. As soon as you venture into local or networked storage based on HDDs you are in my experience looking for something 'fast enough', which will heavily depend on the use case (which means you need to test).
HDDs can offer good performance for workloads with large sequential reads/writes, and this is in my experience often this type of workload that is used when specifying performance metrics for HDD based storage. Unfortunately this is generally not the access patterns that an Elasticsearch workload tend to exhibit, as random reads and writes are the norm rather than the exception.
Would it be suitable for big data?
Ok
Now I have 5 servers, meaning that I will use SSD, but I will use it in the hot layer
I want a type of storage in order to use it in the cold layer is there a type you recommend me to use in the cold layer My problem now is in the cold layer and what is the appropriate type to store data at lower costs than the SSD
Is there a type you recommend me to use or a specific model?
Specifications of the servers I have:
RAM:1.5TB DDR4
CPU: 2* Intel xeon platinum 8352V
Space 70 TB SSD
Do I make one node in each server or do I make more than one node
What is the best practice of a server node or more than one node?
That is, if you split, there will be a loss of some resources due to the default partition .... etc
This will depend on your use case and what performance requirements you have when querying this data. You will need to test to see what is fast enough for you. A lot of logging use cases can deal with high density nodes on slower storage for older data as they often rely on aggregations and do not retrieve documents from random places on disk. If you instead have a search use case the performance characteristic will likely look different.
Typically you make one node per server as you do not want nodes to fight over the page cache. If you have very large servers it often makes sense to host multiple nodes per erver, but then it is common to use containers or virtualisation to separate the and allocate resources to the nodes.
Okay, let's assume the worst assumption is that there is a lot of research on this data.
What type do you recommend after the SSD to use for storage in the cold layer?
What about these specifications?
Specifications of the servers I have:
RAM:1.5TB DDR4
CPU: 2* Intel xeon platinum 8352V
Space 70 TB SSD
As I have stated previously you will need to test as it will depend a lot on you particular use case, data and access patterns. I would recommend running a proof-of-concept to see how your use case work with this hardware and how it meets your requirements. I do not think anyone here can look at some server specification and with any accuracy say whether it will work for you or not as there are a lot of unknowns.
I understand that it is necessary to do an experiment with it so that I can determine
But I want to know this matter before buying in order to buy the best
Can you tell me what is the best type of HDD storage that elasticsearch can work on is good?
For example, the storage that I had is a Qnap NAS, is there a better type of it? Is there a model you recommend for me?
I can not tell that as I know almost nothing about the use case and am not familiar with the products you mention. Rather than going ahead and buying hardware based on guesses and recommendations on a public forum I would recommend you run some tests/benchmarks in the cloud first. There you can test with different types of storage and experiment with different levels of IOPS to see what effect different levels have without a large upfront commitment.
If the problem is in knowing the use case I will tell you what you want to know
And also from another side others can benefit from that
Also, I will try the cloud, but it needs time, and I try to save time, and in the end, there is data that will be on the cloud as well, but after making the initial procedures, so I convert to gaining time, you offer great ease on the cloud, so I am interested in the ID that will be local because managing it needs some effort and so on
Can you give me an example of some types?
I can not give any further guidance. I do however believe you are making a big mistake by not running a POC to validate the use case and test out hardware profiles. Good luck!
Ok
thank you.
Can I wholeheartedly second what Christian has written about a POC. consider using a cloud service too, pay as you go, and might save you from purchasing mistakes.
tbh I think you asked wrong question, because if you think the specific cold layer storage model is most significant, but without a use-case justification for that assumption, my hunch is you are going in the wrong direction.
If I've understood correctly, you have already decided (why?) to have 5 servers with approx 7.5TB total RAM and 350TB of total local storage, which you plan to use for a hot layer? I presume your hot layer will have at least 2 copies of each document, so a maximum of ca: 160TB of hot data available at a time. Does that match maximum rate of ingress? You validated this ingress rate is achievable within your infrastructure?
What's your data's lifecycle? How often is the data queried? Would any critical queries likely span the hot and cold layers? Why are you even envisaging having a cold layer? How much data do you want to keep at the cold layer, and more importantly why?
If you are willing to run in the cloud it might be worthwhile looking into Elasticsearch Serverless, which is now GA. This uses block storage for all data and decouples ingest and querying, so might be a good fit.
I have multiple data nodes with NVME and cold storage node with slow storage unit.
each hot node has two nvme and stripe that is fastest I can get. I have tested ssd and hdd as well. didn't came near nvme speed.
Using the cloud is part of the plan, no doubt, I will use it, but the large amount of data will be expensive, so I will not store all the data in the cloud, there is data that will be stored locally, so I am asking about the type of storage, and now I am in the first stages, that is, in the beginning, I am still studying about the subject, I want to know all the possibilities for the cloud, this is inevitable, and in Elasticsearch there is ease, the most difficult is to manage the data locally, so I ask on this side only
The use case requires that, for example, if you store data in the cold layer with poor storage, it will be bad access and so on, there are types in which storage and speed are good than other types
example
NAS & SAN
SAN is better than NAS and what I meant by my question is what is the appropriate and fastest storage used in Elasticsearch this of course will be for the cold layer
As for the hot layer, it will undoubtedly be an SSD and there is data that will be stored in the cloud, and this is undoubtedly necessary and will be the best, so there is data that will be stored in the cloud, and in order for all data to be accessed quickly, I asked about the fastest type
Yes, I understand why I studied it for that, but it will be expensive, is that right? If I decide to divide the data into two layers, hot and cold in the hot layer, only the last 14 days will be stored for more than 14 days, it will be stored in the cold layer, which has a storage type HDD, but of a type that is efficient
This way the number of servers will decrease and the cost will decrease
As for verification, yes, I did that, I did some experiments, and I calculated the data provided will be for a rather long time, not for one month.
© 2020. All Rights Reserved - Elasticsearch
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant logo are trademarks of the Apache Software Foundation in the United States and/or other countries.