RAM usage and numeric fields with a limited amount of values and a lot of documents (KD Tree ?)

LeMic · December 4, 2018, 11:18am

Hi,

I want to understand why a field indexed as a keyword consume a constant amount of memory (RAM) while the same field indexed as a numeric datatype consumes memory in proportion to the number of documents in the index.

I know keywords are indexed in RAM by Finite state transducers (FST) and numeric fields by KD Tree (BKDReader) as showed by GET /_segments?verbose=true&pretty (RAM tree)

For example, if I try to index TCP ports values (numeric integers from 0 to 65535), the KDTree keep growing while the FST remains constant (65536 docs with each value, 1M docs, 10M docs...)

Is there a better way to index numeric fields with a limited amount of different values and a lot of documents while keeping the ability to sort them or querying a range ? (for identifiers I can use keywords but it's not relevant for all numbers).

Elasticsearch version : 5.5.0

system · January 1, 2019, 11:18am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
High memory usage with large number of fields Elasticsearch	1	1078	July 6, 2017
Spike in Fielddata memory usage Elasticsearch	1	442	June 25, 2019
Keyword datatype and Identifiers Elastic Training	4	469	June 20, 2022
Resource upper bounds on keyword sorting Elasticsearch	1	324	January 6, 2021
How does Elasticsearch indexes non-text fields Elasticsearch	5	711	September 25, 2022

RAM usage and numeric fields with a limited amount of values and a lot of documents (KD Tree ?)

Related topics