Hi,
I want to understand why a field indexed as a keyword consume a constant amount of memory (RAM) while the same field indexed as a numeric datatype consumes memory in proportion to the number of documents in the index.
I know keywords are indexed in RAM by Finite state transducers (FST) and numeric fields by KD Tree (BKDReader) as showed by GET /_segments?verbose=true&pretty
(RAM tree)
For example, if I try to index TCP ports values (numeric integers from 0 to 65535), the KDTree keep growing while the FST remains constant (65536 docs with each value, 1M docs, 10M docs...)
Is there a better way to index numeric fields with a limited amount of different values and a lot of documents while keeping the ability to sort them or querying a range ? (for identifiers I can use keywords but it's not relevant for all numbers).
Elasticsearch version : 5.5.0