Resource upper bounds on keyword sorting

I was reading through the docs on fielddata and came about this:

It usually doesn’t make sense to enable fielddata on text fields. Field data is stored in the heap with the field data cache because it is expensive to calculate. Calculating the field data can cause latency spikes, and increasing heap usage is a cause of cluster performance issues.

Most users who want to do more with text fields use multi-field mappings by having both a text field for full text searches, and an unanalyzed keyword field for aggregations, as follows:

I then wanted to explore more why keyword would be more performant and came across this topic which explains how it uses the disk and filesystem cache instead of the heap.

My question is, is there still a limit to the cardinality of a keyword field even given its backing by disk and fs cache rather than heap?

E.g. How would sorting by a keyword field perform on a simple query if there are 1M keywords of 32 byte length? 1 Billion? 1 Trillion? What's the upper limit and in what ways would Elasticsearch be bottle-necked?


This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.