Cardinality limit in elastic?


(Roman Margolis) #1

I was wondering if there is some kind of limit of possible string field cardinality in elastic index and shard.

I know shard sizing is dependent on particular data and query patterns, and that massive shards will affect performance negatively.
But, theoretically speaking, suppose I have a very high cardinality field, with ~ 10Bil (10,000,000,000) possible values, all in a single shard (with even bigger doc count). In such a scenario, could the global ordinal map be constructed to represent all possible values? If so, how much memory will it consume? Are global ordinals represent sequential or random ids? if the ids are random, could there possibly be any id collisions with different values? Does the answers to any questions change if the shard is optimized (to a single segment)?


(Mark Walkom) #2

First up, that won't work, cause there is a 2 billion doc limit to a shard (it's a Lucene limit) :stuck_out_tongue:

But if you managed to get that 2 billion in there then I'd imagine there could be problems with keeping this ordinal list in memory along with other things. This is something that would really need testing, I don't know if we have anything on this.


(system) #3