We're heavily reliant on doc-types now, but i know doc-types is going away in 6.x. So i'm exploring idea of creating one index of each of the doc types. Would a few hundred indices cause problem? Around 500 indices? How many is consider too many indices?
We have a 16 node cluster with about 10TB data, excluding replicas.
The primary concern is more for shard count per node, rather than total number of indices. An acceptable number of shards per node can vary greatly depending on many factors.
Generally speaking, with data nodes having 30g heaps, you should be safe with 600 to 1,000 shards per node. While the cluster can operate safely with more than that in many instances, you would need to pay very close attention to the heap availability in Monitoring. If you are able to still maintain a gentle sawtooth pattern with the peaks at around 70% of the heap, and the troughs around 50%, the node is healthy (with regards to memory usage). If, however, the peaks in the graph are consistently over 70%, and staying there, rather than creating that sawtooth pattern, or the garbage collection is occurring too frequently, and not freeing up enough memory, then that will be an indication that your cluster + nodes are experiencing memory pressure.
This may be familiar to you already, and if so, you're well versed. The thing to be aware of is that having an increased number of shards per node will increase memory pressure. Keeping your cluster healthy with more indices (shards per node) frequently becomes a heap management exercise.
Thanks for the quick response! Thru testing, we found out that the optimum shard size is around 30 GB for us. so we have guideline to not exceed that when creating indices. With the new design, I think we will end up with about 150 shards per nodes. And we'll be sure to monitor the stats that mentioned.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.