Is there any recommended ratio between the number of master, data, coordinator and ingestion nodes?

calin · December 13, 2023, 2:00pm

Currently working with equal number of master, data and coordinator nodes (10). Need to add some ingestion nodes.

They all have 2 CPU/node, master and coordinator have 4 GB each, data has 16 GB.

I haven't done the sizing myself, I'm just taking over now.

Is there any recommendation for a ratio ingestion to data or ingestion to coordinator ?

How about in terms of resources ?

Any help would be greatly appreciated.
Thanks.

leandrojmp · December 13, 2023, 2:06pm

Can you provide more context about this? How many nodes you have, and what are their roles? It is not clear how master nodes you have and how many data nodes you have.

Also, coordinating-only and ingest-only nodes are barely required, normally only big clusters where there is some impact would need to have those nodes as dedicated.

calin · December 13, 2023, 2:36pm

Sorry, I meant 10 each. 10 master, 10 coordinator, 10 data.

leandrojmp · December 13, 2023, 2:47pm

You have 10 master nodes, 10 data nodes and 10 coordinator nodes with a total of 30 nodes in the cluster?

Can you run GET /_cat/nodes on Kibana Dev tools and share the response?

calin · December 13, 2023, 3:03pm

Yes, 30 nodes.

I'm not sure I am allowed to share such info and I'll have to see if I have access yet. Not sure if I mentioned, I'm new in the project, I wasn't the one who designed it, I'm generally new to sizing/hardware/infrastructure stuff.

I'm sorry for the lack of details, I wish I knew more myself.

leandrojmp · December 13, 2023, 3:26pm

The only information that could be seen as sensitive in the response of the GET /_cat/nodes request, is the IP address of the nodes, you can redact it.

It would help understand how your cluster is configured.

But in overall having 10 masters and 10 coordinating-only nodes looks like a waste of resources.

A production cluster needs at least 3 master nodes, and you will probably not need more than that for small/medium clusters, so you probably will be able to reduce the number of master-dedicated nodes to 3.

Also, coordinating nodes are rarely needed, specially in small clusters, if you have really 10 coordinating-only nodes you can probably remove them and point your clients (indexing and search) directly to the data nodes.

calin · December 13, 2023, 3:28pm

To give you more details from the few I know

Expected traffic is about 100 million logs/day, with average size of 1 KB/log. Retention policy of 1 year.

I don't think it's a small or medium cluster

Christian_Dahlqvist · December 13, 2023, 4:35pm

100 million documents per of 1kB each is about 95GB raw data per day. In a year that is about 34TB. If we assume this is the amount of space it takes up on disk and that you have 1 replica shard that means in the region of 68TB of total storage. Sounds like a medium sized cluster.

leandrojmp · December 13, 2023, 4:41pm

With 100 million logs/day with an average dize of 1 KB, this leads to something around 100 GB per day,and with a retention of 1 year, this ends up to something close 36.5 TB and adding replicas for redundancy the total size would be close to 73 TB.

This is something of a medium size cluster, it is pretty similar in size to one that I manage.

The number of total nodes will depend on how you will organize your data, for example if you use a Hot/Warm architecture, you could have faster hot nodes with smaller disks and a little slower warm nodes with larger disks.

calin · December 13, 2023, 5:43pm

@leandrojmp @Christian_Dahlqvist

Seems like I underestimated the size of large clusters. Like I said, I don't have much experience with this, so thanks for the help, it's greatly appreciated.

Size is exactly in the middle of your estimations, it's about 70.5 GB.

Yes, we use hot/warm architecture, but the nodes are sized the same per type (all data nodes the same, all coordinating nodes the same).

Shards wise there are different numbers of shards/index per type of log and hot/warm/cold phase, but in total 80 indices and about 800 shards, with shard sizes between 15 and 70 GB.

@leandrojmp could I ask for the specs of the cluster you manage ?

leandrojmp · December 13, 2023, 5:46pm

Normally you would have hot nodes with more resources than warm nodes, I do not think you need coordinating only nodes in this case, specially 10 coordinating only nodes.

I have a similar cluster size, my hot nodes are also the ingest nodes, and all the clients only talk with the hot nodes, for this reason they are have 64 GB of RAM and 16 vCPU, the warm nodes are pretty smaller with 2 vCPU and 16 GB of RAM.

The disk size in this case is the same for both types of nodes, bot hot nodes are fast ssd and warm nodes are hdd backed.

calin · December 13, 2023, 5:54pm

But how many of each ?
(by the way, I added some more info in the previous message)

And if you don't mind me asking, how do you deploy them ? In the Helm charts I only have:

elastic:
    master:
        replicaCount: 10
...
    data:
        replicaCount: 10
...
    coordinator:
        replicaCount: 10

Of course with memory and cpu values for each.

But there is no distinction between hot, warm, cold.

leandrojmp · December 13, 2023, 6:36pm

I use traditional VMs, I do not use Kubernetes, so I cannot help with helm charts.

calin · December 14, 2023, 1:05pm

I made a mistake earlier, the total size is 70.5 TB, not GB, obviously

@leandrojmp you mind me asking how many nodes of each you have (outside the 3 master nodes) ? If not exact number, maybe ball-park ?

Thank you and thank you for all the info you already provided.

system · January 11, 2024, 1:06pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Planning a new cluster installation Elasticsearch	3	336	March 31, 2020
Data node as Master Node Elasticsearch	6	1286	February 28, 2019
How elastic cluster the number about nodes? Elasticsearch	10	1456	October 27, 2017
Usage coordinator node before ingest node and data node Elasticsearch	3	380	February 10, 2022
Kibana --> ES master vs ES Coordinating Nodes in Cluster Question Elasticsearch	2	1383	March 10, 2019

Is there any recommended ratio between the number of master, data, coordinator and ingestion nodes?

Related topics