Optimizing for reads with very small and stable index

Meisseli · March 24, 2023, 9:00am

Hello!

I'm working with a very small index. It is only 65000 documents and is about 50MB in size. Index is also very "stable" since it is only written once a day and does not receive any writes meanwhile. Because of that, I do not have to worry about the performance for indexing.

My goal is to maximize performance for search: number of concurrent searches and the search latency. High availability is a nice bonus, but not the most important part.

I have read extensive number of guides, documentations and tutorials about this subject. I have also benchmarked several different setups. However, since my use case with very small index seems to be so uncommon, I do not know the suitable "basic setup" to start with. For example it is usually recommended to have 1 shard for 40G of data. But on the other hand there should be at least 1 shard per node...

I have now 3 node cluster with 1 shard and 3 read replicas. I have also experimented and benchmarked with other options. I will almost always end up with somehow unbalanced setup with only 2 nodes actually taking the load and 1 staying idle.

What would be your recommendations for basic setup for my use case from where I could begin? Number of nodes? Number of shards? Number of replicas? Anything else I should know? )

Christian_Dahlqvist · March 24, 2023, 9:11am

You probably want a 3 node cluster where all nodes have the same profile (master/data). As you have a single small index I would stick with 1 primary shard and 2 replica shards. Make sure your clients are set up to load balance requests across all three nodes in the cluster. You may also consider using '_local' preference (do not think this is default).

This way all nodes hold a copy of the data and can serve it locally. As there is a single primary shard you optimize the number of concurrent searches the cluster can handle and the shard size should not cause any performance issues.

Meisseli · March 24, 2023, 11:40am

Thank you for your brief answer!

Couple of follow-ups, if you may:

I have not defined node.roles at all at the moment, so I think that all nodes has now multiple roles (master, data, data_content, data_hot , ingest, ml etc....) as a default. Should I specifically define node.roles to [ master, data ] for all nodes instead of this default setup? Do I need to mark all nodes as master or just one node?
I'm using REST API instead of "direct" client library. I think, that load balancing is set upped there out-of-the-box, am I right?

I'm excited to benchmark these new settings! It is great to get a decent starting point, so thank you very much for your input!

Christian_Dahlqvist · March 28, 2023, 9:41am

You can leave all roles enabled for all nodes.

The client determines which node or nodes it connects to so this will depend on the client configuration and design.

system · April 25, 2023, 9:42am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Tiny dataset very high read rate, how to optimise? Elasticsearch	2	408	December 13, 2019
Performance for small index Elasticsearch	7	775	October 20, 2020
Numerous small shards or Few big shards? Elasticsearch	4	416	April 20, 2018
When do you need more then 1 shard? Elasticsearch	12	1865	July 6, 2017
Shards and replicas for smaller indexes Elasticsearch	6	1166	May 23, 2018

Optimizing for reads with very small and stable index

Related topics