Is it possible to get the total number of unique facet values without
getting all of them?
For example, there may be thousands of unique facet values for a
particular facet, and while I am getting only top 10 facet values
with counts, I also want to know how many of the unique facet values
remain (for example, for pagination).
Note that this is different from the total facet count currently
returned. As far as I understand, the information should be available
internally. It would be great be able to get it out of the system.
This one is tricky, because we do a "scatter gather" type computation of
the terms facet. We could potentially get a unique number per shard, and
then the result would be an approximation.
Is it possible to get the total number of unique facet values without
getting all of them?
For example, there may be thousands of unique facet values for a
particular facet, and while I am getting only top 10 facet values
with counts, I also want to know how many of the unique facet values
remain (for example, for pagination).
Note that this is different from the total facet count currently
returned. As far as I understand, the information should be available
internally. It would be great be able to get it out of the system.
Yes, I realized that it is much more difficult to do it in a
distributed system. In my system, data follows a power-law
distribution, so I used the facet statistics that elastic currently
gives me (top N counts, total count, and other count) to fit the
distribution and use it to estimate approximate unique count, which
does a reasonable job.
This one is tricky, because we do a "scatter gather" type computation of
the terms facet. We could potentially get a unique number per shard, and
then the result would be an approximation.
Is it possible to get the total number of unique facet values without
getting all of them?
For example, there may be thousands of unique facet values for a
particular facet, and while I am getting only top 10 facet values
with counts, I also want to know how many of the unique facet values
remain (for example, for pagination).
Note that this is different from the total facet count currently
returned. As far as I understand, the information should be available
internally. It would be great be able to get it out of the system.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.