My ES cluster has grown over the past few months. My drives are now 90% full, so I am adding more, while I am on the topic of managing-my-es-cluster I was wondering about maximum shard sizes. The biggest index is reporting 6billion documents, 4T bytes, over 24 shards, each shard approaching 200G.
The 200G shard size makes moving a shard a slow process. When I add a node, it can be over an hour of copying (cluster.routing.allocation.cluster_concurrent_rebalance: 1) before it can take over the query load. I believe smaller shards will replicate faster, allowing the new node to participate sooner. Am I right about this?
How small should the shards be? 20G shards will copy over in 1/10 the time (about 6min), but that would mean 240 shards! That sounds excessive! What are the memory requirements for a node (node.master=true, node.data=false) that must aggregate query results from so many shards?
On the subject of hard limits: I read that a Lucene index has a 2billion document limit. Good, I am far from hitting that. I also read that ES version 2.0+ will put shards on one datapath. Does that mean my datapaths must be larger than the 200G shards?
Thanks