Number of shards

saif · July 29, 2015, 12:07pm

Hello
i have an ES cluster run with 3 nodes "3 servers"
2 master and 3 data nodes

rsyslog-2015.07.28
size: 4.72Gi (7.55Gi)
docs: 30,477,510 (30,477,510)

i want to optimize the number of shards pr node
actually 1 replica and 5 shards

i think to use 6 shards
node 1: 0+3+R5+R2
node 2: 1+4+R4+R1
node 3: 2+5+R3+R0

so what is the best number of shards for my cluster ? 6 ?
thank you

warkolm · July 29, 2015, 10:27pm

You have 3 data nodes, right? The way you have put it is a little confusing.

saif · July 30, 2015, 8:08am

hi
yes 3 nodes
6 shared 1 replica
every node hosted in

Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 8
On-line CPU(s) list: 0-7
Thread(s) per core: 1
Core(s) per socket: 4
Socket(s): 2
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 23
Stepping: 6
CPU MHz: 2499.861
BogoMIPS: 4999.29
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 6144K
NUMA node0 CPU(s): 0-7

and 4TB of storage Raid 0 and 8G of ram

its ok ?

in head plugin
node 1 : (2) 3 4 (5)
node 2 : (0) 1 2 (3)
node 1 : 0 (1) (4) 5

you think that ok for 30,477,510 docs per day ?

warkolm · July 30, 2015, 10:24am

I think that'll be ok, you could even reduce that to 3 shards easily, it's not a lot of data.

saif · July 30, 2015, 11:09am

hi
i have 30 million docs per day
i think 3 shared can't support that number of document

anyone know how to calculate shards number ?

warkolm · July 30, 2015, 11:59am

30 million docs is not a lot, 3 shards can deal with that very easily.

terrasacer · July 30, 2015, 12:16pm

if there the 5 shards of 3 data node 2 replica unnecessary because all of will be not allocated.

saif · July 30, 2015, 12:26pm

30 million per day
and after 12 months
365 * 30 million
?....

saif · July 30, 2015, 12:31pm

sorry yes 1 replica

rodrigo · July 30, 2015, 7:40pm

"Enough" is a very subjective concept...What are your expectations ? 30 mi docs/day are also subjective (in general a breeze for ES to handle), what is in your document ? (to the extreme, it can contain just an id or a full book or logdump, etc). Are you trying to optimize search speed (do you have an SLA) ? Are you worried about scaling up ? Are you keeping the docs forever, using it for storage ? I think if you have better answers for this you can get better responses...if you have daily data, you can use a daily index (foo_20150730, foo_20150801) and an alias to add flexibility to your setup. Number of shards, hosts is always subjective to your use case. Number of replicas attached to your failure tolerance.