I was reading through the docs on fielddata and came about this:
It usually doesn’t make sense to enable fielddata on text fields. Field data is stored in the heap with the field data cache because it is expensive to calculate. Calculating the field data can cause latency spikes, and increasing heap usage is a cause of cluster performance issues.
I then wanted to explore more why keyword would be more performant and came across this topic which explains how it uses the disk and filesystem cache instead of the heap.
My question is, is there still a limit to the cardinality of a keyword field even given its backing by disk and fs cache rather than heap?
E.g. How would sorting by a keyword field perform on a simple query if there are 1M keywords of 32 byte length? 1 Billion? 1 Trillion? What's the upper limit and in what ways would Elasticsearch be bottle-necked?