I have an index with millions of rows, most of the rows contain a hash value (md5)
I want to group by the hashed value and calculate the count of documents per hash and then sum the total count. This only for buckets with at least 2 documents.
I do this using Kibana and Elasticsearch (7.1). I got this working but for this particular set I have more then 800K of group by results (buckets) so Elasticsearch runs into a too_many_buckets_exception.
I know I can increase the max_bucket value but as far as I found out this is something you shouldn't do. Also in the future the 800K may easily become 2 MIL buckets or higher.
How can I get this metric witouth having to increase the max_bucket value? For me, used to SQL, this seems like a relatively easy question.