We are trying to show metrics such as yield & harvest.
Yield is basically how much of data is available for searching.
The problem with active_shards_percent_as_num is that it considers both primary and replica. But let's say there are no replica assigned for a index. Then the active_shards_percent_as_num is 50%. We can't use this number for denoting the yield since we have all the primaries hence the yield should be 100%
I don't see a way of calculating the percentage purely based on active_primary_shard count [how do I know how many primary shards are there in the cluster when things are messy]
active_shards_percent_as_num gives you an accurate number of all the data that is searchable. It doesn't consider indices with 0 replicas into account, because as long as the primary shards are available what's the concern?
What's the perceived value in this metric you want to calculate?
Let's say I have an index foo with 5 primary shards with 1 replica. Now if the replicas are not assigned, the active_shards_percent will be 50%. But since all the primaries are there, we basically have all the data hence the yield is 100% [yield = available available / total data]
Now let's say there was one primary missing, then the active_shard_percent would be around 90-99%. Even in this case the yield should be 100%. Basically I wouldnt want to consider replicas for calculating the yield
With 0 replica yes we can directly use the active_shards_percent value.
active_primary_shard_percentage = available primaries / total primaries
So whenever a cluster is RED, we want to tell our clients how much of data is available. If we display active_shards_percent, then when no replicas are assigned, it will show 50% but that's not the case afterall [all the primaries will still provide you all the data you need]
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.