What is the fastest storage type suitable for elasticsaerch than the HDD?

dsagent · November 27, 2024, 5:50pm

Hello
can you help

In order to store big data, what is the best type of HDD storage that is fast

For example
Qnap NAS is there a faster type than it?

rugenl · November 28, 2024, 12:06am

What is fast enough for your use case?

Local SAS RAID-6 was fast enough for ours. 6 hot nodes, 2 cold. It’s for logs.

Christian_Dahlqvist · November 28, 2024, 11:15am

If you want 'fast' storage for Elasticsearch you, as described in other threads, typically need SSD backed storage. As soon as you venture into local or networked storage based on HDDs you are in my experience looking for something 'fast enough', which will heavily depend on the use case (which means you need to test).

HDDs can offer good performance for workloads with large sequential reads/writes, and this is in my experience often this type of workload that is used when specifying performance metrics for HDD based storage. Unfortunately this is generally not the access patterns that an Elasticsearch workload tend to exhibit, as random reads and writes are the norm rather than the exception.

dsagent · November 30, 2024, 7:59am

Would it be suitable for big data?

dsagent · November 30, 2024, 8:15am

Ok
Now I have 5 servers, meaning that I will use SSD, but I will use it in the hot layer

I want a type of storage in order to use it in the cold layer is there a type you recommend me to use in the cold layer My problem now is in the cold layer and what is the appropriate type to store data at lower costs than the SSD

Is there a type you recommend me to use or a specific model?

Specifications of the servers I have:
RAM:1.5TB DDR4
CPU: 2* Intel xeon platinum 8352V
Space 70 TB SSD

Do I make one node in each server or do I make more than one node
What is the best practice of a server node or more than one node?

That is, if you split, there will be a loss of some resources due to the default partition .... etc

Christian_Dahlqvist · November 30, 2024, 8:47am

This will depend on your use case and what performance requirements you have when querying this data. You will need to test to see what is fast enough for you. A lot of logging use cases can deal with high density nodes on slower storage for older data as they often rely on aggregations and do not retrieve documents from random places on disk. If you instead have a search use case the performance characteristic will likely look different.

Typically you make one node per server as you do not want nodes to fight over the page cache. If you have very large servers it often makes sense to host multiple nodes per erver, but then it is common to use containers or virtualisation to separate the and allocate resources to the nodes.

dsagent · November 30, 2024, 8:52am

Okay, let's assume the worst assumption is that there is a lot of research on this data.
What type do you recommend after the SSD to use for storage in the cold layer?

dsagent · November 30, 2024, 8:54am

What about these specifications?

Specifications of the servers I have:
RAM:1.5TB DDR4
CPU: 2* Intel xeon platinum 8352V
Space 70 TB SSD

Christian_Dahlqvist · November 30, 2024, 9:05am

As I have stated previously you will need to test as it will depend a lot on you particular use case, data and access patterns. I would recommend running a proof-of-concept to see how your use case work with this hardware and how it meets your requirements. I do not think anyone here can look at some server specification and with any accuracy say whether it will work for you or not as there are a lot of unknowns.

dsagent · November 30, 2024, 1:58pm

I understand that it is necessary to do an experiment with it so that I can determine

But I want to know this matter before buying in order to buy the best

Can you tell me what is the best type of HDD storage that elasticsearch can work on is good?

For example, the storage that I had is a Qnap NAS, is there a better type of it? Is there a model you recommend for me?

Christian_Dahlqvist · November 30, 2024, 4:42pm

I can not tell that as I know almost nothing about the use case and am not familiar with the products you mention. Rather than going ahead and buying hardware based on guesses and recommendations on a public forum I would recommend you run some tests/benchmarks in the cloud first. There you can test with different types of storage and experiment with different levels of IOPS to see what effect different levels have without a large upfront commitment.

dsagent · November 30, 2024, 5:11pm

If the problem is in knowing the use case I will tell you what you want to know
And also from another side others can benefit from that

Also, I will try the cloud, but it needs time, and I try to save time, and in the end, there is data that will be on the cloud as well, but after making the initial procedures, so I convert to gaining time, you offer great ease on the cloud, so I am interested in the ID that will be local because managing it needs some effort and so on

Can you give me an example of some types?

Christian_Dahlqvist · November 30, 2024, 7:20pm

I can not give any further guidance. I do however believe you are making a big mistake by not running a POC to validate the use case and test out hardware profiles. Good luck!

dsagent · December 1, 2024, 5:47am

Ok
thank you.

RainTown · December 4, 2024, 11:06am

Can I wholeheartedly second what Christian has written about a POC. consider using a cloud service too, pay as you go, and might save you from purchasing mistakes.

tbh I think you asked wrong question, because if you think the specific cold layer storage model is most significant, but without a use-case justification for that assumption, my hunch is you are going in the wrong direction.

If I've understood correctly, you have already decided (why?) to have 5 servers with approx 7.5TB total RAM and 350TB of total local storage, which you plan to use for a hot layer? I presume your hot layer will have at least 2 copies of each document, so a maximum of ca: 160TB of hot data available at a time. Does that match maximum rate of ingress? You validated this ingress rate is achievable within your infrastructure?

What's your data's lifecycle? How often is the data queried? Would any critical queries likely span the hot and cold layers? Why are you even envisaging having a cold layer? How much data do you want to keep at the cold layer, and more importantly why?

Christian_Dahlqvist · December 4, 2024, 11:21am

If you are willing to run in the cloud it might be worthwhile looking into Elasticsearch Serverless, which is now GA. This uses block storage for all data and decouples ingest and querying, so might be a good fit.

elasticforme · December 4, 2024, 3:56pm

I have multiple data nodes with NVME and cold storage node with slow storage unit.

each hot node has two nvme and stripe that is fastest I can get. I have tested ssd and hdd as well. didn't came near nvme speed.

dsagent · December 5, 2024, 7:55am

Using the cloud is part of the plan, no doubt, I will use it, but the large amount of data will be expensive, so I will not store all the data in the cloud, there is data that will be stored locally, so I am asking about the type of storage, and now I am in the first stages, that is, in the beginning, I am still studying about the subject, I want to know all the possibilities for the cloud, this is inevitable, and in Elasticsearch there is ease, the most difficult is to manage the data locally, so I ask on this side only

dsagent · December 5, 2024, 8:06am

The use case requires that, for example, if you store data in the cold layer with poor storage, it will be bad access and so on, there are types in which storage and speed are good than other types
example
NAS & SAN

SAN is better than NAS and what I meant by my question is what is the appropriate and fastest storage used in Elasticsearch this of course will be for the cold layer

As for the hot layer, it will undoubtedly be an SSD and there is data that will be stored in the cloud, and this is undoubtedly necessary and will be the best, so there is data that will be stored in the cloud, and in order for all data to be accessed quickly, I asked about the fastest type

dsagent · December 5, 2024, 8:19am

Yes, I understand why I studied it for that, but it will be expensive, is that right? If I decide to divide the data into two layers, hot and cold in the hot layer, only the last 14 days will be stored for more than 14 days, it will be stored in the cold layer, which has a storage type HDD, but of a type that is efficient
This way the number of servers will decrease and the cost will decrease

As for verification, yes, I did that, I did some experiments, and I calculated the data provided will be for a rather long time, not for one month.

Topic		Replies	Views
Elasticsearch performance in HDD vs SSD and 32 GB vs 64 GB of RAM Elasticsearch	25	3023	June 30, 2023
Which storage type should I use for Elasticsearch? Elasticsearch	2	193	July 10, 2023
Elasticsearch hardware planning Elasticsearch	5	803	July 6, 2017
Storage questions: SSD, shared vs local Elasticsearch	3	910	July 6, 2017
Storage needs for Elastic Search Elasticsearch	3	620	July 5, 2017

What is the fastest storage type suitable for elasticsaerch than the HDD?

Related topics