Clients will 100% notice if these counts are off, especially for "New", since that's what they look at every day to make sure they're not wasting fresh leads.
This is a SaaS based product and visibility of contacts is based off of permissions. The permissions can be very tricky since there are several different layers of access control that can be granted at any point in time. As such, recalculating peoples' counts based off of changing permissions would be quite unfeasible.
So... is there any way that I can get accurate counts without having to keep an actual counter somewhere?
If the total number of unique tags in your system is less than 1,000 then results for your query should be accurate since you asked to consider the top 1,000 values.
It's only if you have say a million tags in a sharded index that the counts for the top 1,000 may be off in results.
The total number of unique tags in the system is in fact in the millions because this is a SaaS product. However, users will typically have no more than a couple hundred uniq "accessible" tags. That's why the permissions_cache filter thing is there.
It's all about the size of the set that each shard brings back to the "reducer node" vs the size of the set that match the query.
If a shard's results represent a subset of all the accessible tags collected in a query then that means some information is left behind and some lower-scoring tags from that shard will be omitted in the final analysis (popular tags are still likely to be accurate).
If your individual users each have a set of <1,000 tags all should still be well. If a user has a massive number of tags then there is the potential for error but we tell you about this in the search results. Use this feedback to figure out if you have a problem or not before we talk about potential solutions.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.