Optimizing single-node search performance

Hello all,

I am relatively new to Elasticsearch, and am currently trying to optimize a single-node ES instance. I am trying to understand a concept, which seems to have conflicting information.
First of all, the data I am working with is around 130 Gb of documents (~230 million docs), currently distributed over 5 shards. It resides on a single node running on a machine with the following specs:

  1. 2x Intel Xeon E5-2680 v3, 48 threads total.
  2. RAM 256 GB total
  3. Enough SATA3 HDD 7200 RPM storage

The cluster usage would be very search-heavy with aggregations, with indexing happening infrequently to not at all. Also, data availability is not a concern at the moment. Now I mainly have a question regarding how primary shards make use of parallelism within one machine. From what I understood, multiple primary shards allow for a single query to be run in parallel on multiple shards using multiple threads. However, an Opster article stated that:

``When queries are run across different shards in parallel, they execute faster than an index composed of a single shard, but only if each shard is located on a different node and there are sufficient nodes in the cluster.''

This confused me a since it would imply that having more than one primary shard for a single index on a single node would not speed up search performance, even though multiple threads could be used to search the shards. I can imagine that it could have to do with some I/O bottleneck, but to me, it seems like having a single shard with 130 Gb of data would not be ideal.
So my question is whether having multiple shards on a single node could decrease search latency.

Also, any insight into what to look out for when optimizing search performance would be greatly appreciated. E.g. would it make sense to run a multi-node deployment on the same machine?

Sorry for the wall of text, and thank you for your time!

The most important factor for optimising serach performance is to try and make the full data set fit into the operating system page cache. Assuming the full host is dedicated to Elasticsearch it looks like you have enough RAM (assuming 30GB heap) to achieve this, which is good.

When you query these 5 shards, 5 tasks will be queued up and executed in parallel as long as there are threads available. The size of the threadpool depends on the number of CPU cores.

When sizing the optimal number of shards it is important to know how many concurrent searches you need to support. Do you have any estimate for this?

Yes, it can. Having a small number of shards will allow more queries to be served in parallel, but having a number of smaller shards can reduce latency at the expense of lower query throughput.

Unless you suffer high heap usage I do not think so.

Hi Christian,

Thank you for your quick response! Initially, I would like to benchmark the deployment using non-concurrent searches. So I'm guessing more is better up to a certain point. In the future I would like to extend to concurrent searches, so would a good rule of thumb would be:
Number of shards = Threadpool size / number of concurrent searches?

If you optimize for low latency at one concurrent search request, you will determine a number of shards that most certainly will not be optimal when the number of concurrent searches increases. I would instead recommend the following:

  1. Try to determine the maximum shard size that still allows you to meet your latency requirements. Create an index with a single primary shard and add some realistic data to it and then benchmark the types of queries you need to support. Then add more data and benchmark the same queries again. This will allow you to plot search latency for different types of queries against shard size. This will show you approximately how much data each shard could hold.
  2. As you know the size of the data set, determine how many shards you need if you make the shards as large as you can (or possibly a bit smaller) based on the first step. Then create an index with this number of primary shards and benchmark queries against this. Start with a low level of concurrent queries and gradually increase this and measure latencies for each step. This allows you to plot query latency against the number of concurrent queries and determine how many concurrent queries the node can handle while meeting latency requirements.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.