RAM usage and numeric fields with a limited amount of values and a lot of documents (KD Tree ?)


#1

Hi,

I want to understand why a field indexed as a keyword consume a constant amount of memory (RAM) while the same field indexed as a numeric datatype consumes memory in proportion to the number of documents in the index.

I know keywords are indexed in RAM by Finite state transducers (FST) and numeric fields by KD Tree (BKDReader) as showed by GET /_segments?verbose=true&pretty (RAM tree)

For example, if I try to index TCP ports values (numeric integers from 0 to 65535), the KDTree keep growing while the FST remains constant (65536 docs with each value, 1M docs, 10M docs...)

Is there a better way to index numeric fields with a limited amount of different values and a lot of documents while keeping the ability to sort them or querying a range ? (for identifiers I can use keywords but it's not relevant for all numbers).

Elasticsearch version : 5.5.0