Getting the number of unique facet values


(Zmicier) #1

Is it possible to get the total number of unique facet values without
getting all of them?

For example, there may be thousands of unique facet values for a
particular facet, and while I am getting only top 10 facet values
with counts, I also want to know how many of the unique facet values
remain (for example, for pagination).

Note that this is different from the total facet count currently
returned. As far as I understand, the information should be available
internally. It would be great be able to get it out of the system.

Thank you

Dmitry


(Shay Banon) #2

This one is tricky, because we do a "scatter gather" type computation of
the terms facet. We could potentially get a unique number per shard, and
then the result would be an approximation.

On Mon, Mar 26, 2012 at 11:29 PM, Zmicier zialonka@gmail.com wrote:

Is it possible to get the total number of unique facet values without
getting all of them?

For example, there may be thousands of unique facet values for a
particular facet, and while I am getting only top 10 facet values
with counts, I also want to know how many of the unique facet values
remain (for example, for pagination).

Note that this is different from the total facet count currently
returned. As far as I understand, the information should be available
internally. It would be great be able to get it out of the system.

Thank you

Dmitry


(Zmicier) #3

Yes, I realized that it is much more difficult to do it in a
distributed system. In my system, data follows a power-law
distribution, so I used the facet statistics that elastic currently
gives me (top N counts, total count, and other count) to fit the
distribution and use it to estimate approximate unique count, which
does a reasonable job.

On Mar 27, 2:56 pm, Shay Banon kim...@gmail.com wrote:

This one is tricky, because we do a "scatter gather" type computation of
the terms facet. We could potentially get a unique number per shard, and
then the result would be an approximation.

On Mon, Mar 26, 2012 at 11:29 PM, Zmicier zialo...@gmail.com wrote:

Is it possible to get the total number of unique facet values without
getting all of them?

For example, there may be thousands of unique facet values for a
particular facet, and while I am getting only top 10 facet values
with counts, I also want to know how many of the unique facet values
remain (for example, for pagination).

Note that this is different from the total facet count currently
returned. As far as I understand, the information should be available
internally. It would be great be able to get it out of the system.

Thank you

Dmitry


(system) #4