Optimal node roles for 100-node cluster

Musab_Dogan · March 1, 2022, 1:24pm

A cluster of 100 nodes was installed. We want it to serve as a search engine like Google.
Servers are located in a data center.
Raid-0 structure was preferred for the high-speed requirement.
Query traffic will be more intense than indexing traffic.
The query will be made with queries containing boolean, wildcard, fuzzy, transposition, function score.
The data in the index is updated periodically. (like once every 1-2 weeks)
There will be 1 replica for each index

Node Hardwares
128GB ram
64 CPU
8.8TB SSD

Node roles
3 Masters
3 Coordinators + ingest
94 Data

As recommended in best practice, a separate cluster will be set up for stack monitoring and the data will be sent to that cluster with metricbeat.
Ingest pipeline (including stack monitoring) is not actively used. We preferred coordinator + ingest in case we need it in the future.

Question 1
Does the coordinator + ingest node role structure work as a full performance coordinator node when the ingest node is not actively used?

Question-2
Would you recommend putting a load-balancer in front of 3 coordinator nodes during indexing or querying?

Question-3
There are index sizes up to 60 TB. When calculated according to best practice:
``Aim for shard sizes between 10GB and 50GB```
index => 60 TB => shard count => must be between 6000 - 1200
Is it ok to use 1500 shards for an index in a system with 100 nodes?
Note: _id-based indexing is done. The index is constantly being updated, it could not be written to multi indexes to avoid duplicate records.

Question-4
We are considering using a (Turkish) dictionary stemmer for natural language processing, but we have performance concerns.
Do you have any suggestions?

Regards,
Musab

warkolm · March 1, 2022, 10:27pm

100 nodes is pretty large FWIW. We don't really see clusters this large these days, due to the use of things like CCS and CCR.

Yes.

If you can, yes.

You should really test this with your data, your queries, your SLAs.

Same as previous question, you need to test this yourself.

Topic		Replies	Views
Usage of coordinator node for indexing Elasticsearch	7	14885	February 13, 2020
When to use Coordinating only node Elasticsearch	6	3626	November 1, 2022
Is there any recommended ratio between the number of master, data, coordinator and ingestion nodes? Elasticsearch	13	861	December 14, 2023
Cluster advice when scaling Elasticsearch	6	1946	February 18, 2019
Data node as Master Node Elasticsearch	5	1403	January 31, 2019

Optimal node roles for 100-node cluster

Related topics