Shards per node question

Good Day

I'm build a test cluster and was given basically the following...
3 Master Nodes
4 Data Nodes
A couple Client Nodes
A couple Ingestion nodes

My question is...

  1. The setting for number of shards, is that setting set in the config for any node that holds data, or should that be set for all nodes regardless of role.

  2. Our data Nodes are attached to 30+ TB storage drives per node ( there will be ALOT of data ingested (read-only). How many shards per node would be generally rule of thumb "good"?

  3. Same as 2 but with replica sets?

Thx for any insight you all could provide

  1. the number of shards is configured per index for the entire cluster. It's not node specific
  2. it depends mainly on how you want to query your data and. For time series data the advantage of having a new index per day (with a new set of shards) is that you only have to keep shards in memory for recent indices assuming you are applying some kind of date filter. If this is not true and you really want to query everything, then you should make it so that the number of shards you have per node is aligned with how much memory you have on those nodes.
  3. the more replicas you have, the more stuff happens on write and on cluster restart or rebalancing.

A good guideline is to have as few shards as you can get away with in terms of memory (depends on your mapping, amount of data, etc.) and resilience (2 replicas is pretty good but may not be enough for all).

AWESOME INFO!!!

Question: I believe we are at 64gb ram on the data node servers. Isn't the default 5 shards 2 replicas per index? Is that a good number to go with. I just don't want too many cause I know that too can cause issues.

Thx for all the info your giving me. It helps ALOT!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.