What is the largest size that one node can hold?

Hello
I have a lot of data and I want to store it in elasticsearch
What is the largest size that one node can hold?

Thanks for reaching out, @dsagent. I have a few follow up questions here:

  • What version are you using?
  • How are you hosting Elasticsearch?

I'd also be interested in hearing more from you about what you are looking for here.

Here are some other responses from this forum that may be helpful here:

Recent version have made a lot of improvements in heap usage, so nodes can hold quite large data volumes. The more data a node holds the longer it generally takes to query the data. The practical limit therefore often depend on performance requirements and load patterns.

What is your use case and constraints?

8.15.3

local

thank you

I want to know how much the maximum storage the node can hold

Can I do 30TB in one node?

The use case is big, its data is many, there are no restrictions imposed, even if it needs additional resources.

The important thing is that I store the data, so I asked when the node is tolerant

When it comes to exactly how much data a node can hold it is usually limited by heap usage or performance requirements. How much heap is required to hold a specific amount of data on disk will primarily depend on your mappings. Certain features require more data to be stored on the heap and will therefore result in heap becoming a limiting factor faster. Indexing and querying will also require a certain amount of heap so what type of queries and how many you run concurrently will therefore also play a part. The only way to determine this correctly for your use case is to test with real data and queries.

The more data you store on a node, the more disk I/O will likely be required to serve a query and latencies will generally go up as more data is added to the node, at least if you generally search large parts of the data and are not very targeted. It will also matter whether you run aggregations or searches bringing back documents as reading large number of documents from disk can result in a lot of random disk reads. Naturally the type of hardware, specifically storage, impacts this as well.

When you run queries against the data, how targeted will these be (in terms of indices searched) and what kind of latencies are you willing to accept? Is this a simple search use case or will you be using more advanced mappings/features?

I'm targeting a lot of them because I store a certain number of data in the cursor and then move to storage with another pointer

As for the response time, I want it to be fast.

Well, she'll tell you the amount of data.
The data that I have reaches more than 15000 terabytes, so I want to store this data and search it

I do not understand what you mean by this. Will each search target only a single index or a small set of indices or will you need to target most indices?

If you want it to be fast you will need to use fast storage, e.g. local SSDs, and store a lot less data per node. For use cases with very high query concurrency or low latency requirements it is often necessary to ensure all data is cached in the operating system page cache in order to avoid disk I/O.

It would also help if you could quantify what you mean by fast.

Yes, I will use large mappings and change some values such as the length of the value 255 to 3000

And I don't know exactly what features I can use when I have started anything that can benefit me people with it

I will target most indices

How much TB

I mean the speed of response in redoing the search results

If I had 15000TB (15PT) how many nodes do I need?

I do not know. You need to test. I do not even know if your requirements are realistic. With that much data you will likely need multiple clusters as there are practical limits to cluster size. This depends on use case.

If you are expecting very fast queries that target most indices and retrieve a large number of hits, potentially at a high query concurrency, I expect you are in trouble and will struggle to find a cost effective solution.

1 Like

Yes the requirements are realistic and based on study

What are these limits?