Having a 5 node cluster.
3 master-data nodes
2 data nodes
Master-data nodes are 8 cores 28gb ram.
Data-nodes are 4 cores 14gb ram.
Approx 120 million docs and 250gb data. Mostly used for logs (log4net).
My first question, we have so far been running daily indices. Yesterday we had almost 2k indices and 20k shards. I have a feeling that this is not very good for performance. On a scale form 1-10, how bad and why?
Is the general setup with 3 master-data and 2 only data good? Can it be tweaked?
Even though we have followed the general "go-to-production" guidelines with recovery_after_nodes and those settings. Recovery takes several hours. Any guesses about that?
Each shard needs resources to just exist. So a shard with 50GB of data uses as much of those resources as one that has 5MB.
You need to consolidate those indices, weekly/monthly, reduce to a single shard, whatever.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.