Having a 5 node cluster.
3 master-data nodes
2 data nodes
Master-data nodes are 8 cores 28gb ram.
Data-nodes are 4 cores 14gb ram.
Approx 120 million docs and 250gb data. Mostly used for logs (log4net).
My first question, we have so far been running daily indices. Yesterday we had almost 2k indices and 20k shards. I have a feeling that this is not very good for performance. On a scale form 1-10, how bad and why?
Is the general setup with 3 master-data and 2 only data good? Can it be tweaked?
Even though we have followed the general "go-to-production" guidelines with recovery_after_nodes and those settings. Recovery takes several hours. Any guesses about that?
Best regards, Mats
Each shard needs resources to just exist. So a shard with 50GB of data uses as much of those resources as one that has 5MB.
You need to consolidate those indices, weekly/monthly, reduce to a single shard, whatever.
Thanks. Started doing that and can already see increase in performance.
Is there some good rule of thumb when it comes to total memory and the amount of shards and indexes?
Keep shards under 50GB, how much under really depends on your performance requirements.
Ok, we are still far form that in terms of data but will keep that in mind.
Any changes to size and configuration of the nodes? Dedicated master nodes?
Down to 115 indices and 1140 shards. Still too much in your opinion?
It depends on how large the shards are, but much better!