Cluster design with fully replicated data - please advise

Hattivat · December 18, 2017, 12:56pm

I have what seems to be a rather atypical use case for ES - a relatively small (<100 gb) dataset, so that it is perfectly possible to give each server a full copy of the data. The cluster will have to handle a moderate amount of new data but this will be balanced by deletions of old data, so assuming I schedule regular force merges, the size of the dataset should be pretty stable. I also need to be able to support a high number of searches, ideally over a thousand per second, so I want to optimize for that.

I've searched all over the internet for advice on ES cluster design and the commonly approved "best practice" seems to be to have:
3 small servers as master nodes
2+ big servers as data nodes, with more added as data volume grows
1-2 http nodes

However, all of this advice seems designed for a typical ES use case that is quite different from my own - I don't have a massive dataset that I would need to spread across multiple servers, with all the coordination overhead that entails, and I don't expect this dataset to grow much, so if I add more nodes it would be to increase search throughput, not to store more data.

So, my question is - does it make sense to keep all three node functions (http, data, master) separate in my case? Especially the separate http nodes, given that each data node has the complete dataset and is able to fully answer any query, so there is no need to split queries or reduce answers, is it still worth it to have them?

The design I am considering now is:
3 small servers as master nodes
3+ big servers as data/http nodes, each with with the full dataset

Would that work fine? Also, am I correct to assume that with all data nodes having identical data, the cluster state will be tiny and the master nodes can thus be really small?

system · January 15, 2018, 12:56pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Cluster(s) scale and failure design (ES 5.4) Elasticsearch	9	672	October 22, 2017
Cluster design for specific use case Elasticsearch	5	481	November 27, 2019
Questions related to ES cluster architecture Elasticsearch	3	347	July 6, 2017
Scaling: Cluster for speed or for size? Elasticsearch	6	356	July 6, 2017
Should Data Nodes still be the same size? Elasticsearch	3	1018	March 23, 2022

Cluster design with fully replicated data - please advise

Related topics