What are the best servers for storing Elasticsearch data?

Hi

I am storing events from devices in Elasticsearch, and the size of these events is very large. Therefore, I decided to purchase high-performance servers to make search as fast as possible, ideally around 1 second.

I have researched and looked into different types of servers that have a large number of processors and, at the same time, a high number of controllers. I found two servers:

  • IBM Power E1180: Contains 16 processors, each with 16 cores, and the number of controllers is 16.
  • HPE Superdome Flex: Contains 32 processors, each with approximately 28 to 32 cores, and the number of controllers ranges from 8 to 12.

Which one is better for Elasticsearch?
Are there specific types of servers recommended for databases, especially Elasticsearch?

Elasticsearch performance is IMHO often limited by RAM and/or storage performance rather than CPU. For optimal performance it is generally recommended to use fast, local NVMe SSDs. The ideal configuration will however depend on the use case.

Yes, I agree with you on that point, but servers may come with large amounts of RAM and storage capacity while having relatively few processors.

For example:
I have a previous server with 75 TB of storage and 1.5 TB of RAM, but it only has 140 CPU cores and just 1 controller.

If I divide it into 10 virtual environments, each virtual environment would have 7.5 TB of storage, only 14 CPU cores, and 150 GB of RAM.

Would this number of CPU cores and RAM be sufficient?

What type of storage is it?

That sounds reasonable, but it will as I said depend on the use case and workload. I would recommend you set up and test to see how it behaves for your use case.

Storage type: SSD

It Depends™ (it always depends). There’s loads of factors that might affect performance beyond these simple headline figures. ~One thing that might be concerning is how much of that 7.5TiB is solid-state-storage vs how much is backed by spinning disks.~ (edit: already answered) But another question is whether this difference even matters to you. The best system is the cheapest one that achieves acceptable performance, and different users have wildly different opinions about what “acceptable performance” means.

1 Like

I have set up a somewhat similar scenario.

I created 5 nodes, each with the following specifications:

  • 15 TB SSD
  • 256 GB RAM
  • 24 CPU cores

With the current data, performance is still good and fast. However, as the data grows, the load increases and resource consumption rises.

Now, the usage scenario has grown significantly, and even the 75 TB of storage is no longer sufficient for the workload. This has forced me to purchase new servers.

Yes, this is important. I want the search results to return within 1 second or even less. The use case is also large-scale, involving analysis and search using wildcards.

What is the average size of the documents you are indexing?

What is the average size of a result set you expect to have returned?

How many concurrent queries do you expect to need to support? Do these queries search all or a subset of indices?

Be aware that wildcard queries are the most expensive queries in Elasticsearch so you may need to optimise queries and mappings for optimal performance.

If this thread related to the same use case described in this thread I do recommend that you benchmark and test with real data, queries and workload.

The average document size is 700 KB.

The results I expect to be returned are 20,000 documents per connection, and I have 120 concurrent connections. This means a total of 2,400,000 documents.
As for the total size, I’m not certain, but based on the average document size of 700 KB, I estimate the average size of the result set to be around 1,602 GB.

The number of queries ranges from a minimum of 30 to a maximum of 120.
The queries search across all indices from the last 14 days.

Thank you for reminding me about this part. I really want to optimize the queries, but I am forced to use wildcards.
If there is a way to help improve performance, I would appreciate it if you could let me know.

As I stated in my previous reply in the other thread I believe you will find it very challenging to achieve the performance levels you are looking for. Returning large result sets of large documents with high levels of concurrency in under 1 second is IMHO unrealistic.

Yes, it’s the same use case and I am working on optimizing it.
I ran some tests and the results were decent.

Here’s what I did:

  • I divided a server with the following specifications: 75 TB SSD, 1.5 TB RAM, 140 CPUs into the following setup:

5 hot data nodes with the following specs:

  • Roles: data_content, data_hot, ingest
  • 15 TB SSD
  • 24 CPUs
  • 256 GB RAM

Then, on another server, I created 3 master nodes with the following specs:

  • Roles: master only
  • 1 TB SSD
  • 32 GB RAM
  • 16 CPUs

After that, from QNAP storage, I created 1 warm data node with the following specs:

  • Roles: data_warm only
  • 15 TB HDD
  • 64 GB RAM
  • 32 CPUs

Currently, the data size is around 15 TB, and performance is decent compared to a previous use case I had tested. However, it is still not acceptable because the use case is growing very rapidly, and it is expected to grow to 5 times the current size.

You are right about this.
It’s not absolutely necessary for it to be one second; even two seconds would be acceptable once the use case is fully implemented.

What worries me is the possibility of significant slowdowns, which is why I am asking about server types. I am thinking that choosing servers with certain specifications might help somewhat.

Do you need to return the entire document every time or only a select number of fields?
Sub-selecting down to the required / minimum needed fields only can have a measurable positive effect on performance.

1 Like

It varies. In some cases, I need to return all the data, but two-thirds of the queries return only partial data.

For a use case like this to work I believe you will need to really know the use case, data and query requirements/patterns in detail in order to properly optimise documnt structure, data distribution, mappings and queries. That is beyond what can be provided in this forum.

I do not think this can be solved simply by selecting a specific set of hardware, at least not at a reasonable cost.