Please take also the same time to be such detailed trying to explain the internals of ElasticSearch, it was a great help and a good lecture :).
Im still having a pair of questions.
When you said that while the cardinality < 3000 the hashmap algo will be still used, does it mean that in execution time once the threshold is reached is when the algo is replaced by the HLL? If the precision threshold is perhaps set to 5000 this will be the new limit to be considered to change the algo?
I've noticed that the performance of the queries that use the cardinality aggregation is affected by the precision_threshold value, something that left me stunned, when and only when the cardinality aggs is executed deeper to the top level. However, the performance is not affected when the cardinality is executed as a top level aggregator.
For example, the following aggregator with a precision threshold of 6k spends twice time if you compare with the same query but using 3K as precision threshold. Pay attention that the top aggregation has a size of 1.
Using the cardinality aggregator as the top one and modifying the precision threshold the time took is almost the same always, as it should be expected taking into account the way of the algorithm uses the precision threshold, where the cost to calculate the cardinality must be the same if the input is the same, having only a cost in terms of memory if the precision threshold is modified.
I couldn't find info about the transition between the hashmap and the HLL in the main ES webpages, and for me it is crucial. I would suggest you to add it to the official guide, it is something more important than an edge case:)