The first is cardinality, which helps decide if using terms aggregation is a reasonable thing to do. The second is "multi" which counts the number of terms a single document property can have. These "multi-valued" properties must be treated different from single-values properties when aggregating.
{"script":"doc[\"task.dependencies\"].values.size()"} is slow. Is there a faster way to detect if a property has been given more-than-one value?
Since it is uncommon for single-valued properties to suddenly have multiple values, and multi-valued properties are commonly multi-valued (like text type), then I need not scan all the data; performing this aggregation on only a recent subset. I make a new index for each week of data. I chose a property that limits my query to recent data. I assume the script is only executed on a single index (or two), rather than all indexes. This reduces the script workload, and the query runs faster.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.