I've a v7.2.1 ES cluster and trying to lower heap memory usage. When I checked GET /_cat/fielddata I found a lot of non-text type fields in the list (e.g., "foo.keyword", "bar_ip"). As far as I understood from the relevant documentation, those shouldn't appear in that list nor occupy heap memory. Any idea what am I missing?
If you're executing aggregations or sorting on keyword fields with high cardinality (e.g. a field which represents a unique id or, for example, the _id of the document), the global ordinals are generated and as they're expensive to generate, they're cached indefinitely (by default).
Another collateral case where global ordinals are being used is when you're using Kibana KQL and you are using the auto-complete on the field _id. Kibana will trigger behind the scenes an aggregation on such field.
If the global ordinals have been loaded by mistake (a bad aggregation or a bad query), you can clean the cache using POST /<index name>/_cache/clear?fielddata=true. See here about the clear cache API.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.