Understanding fielddata size

cwinkler · September 18, 2019, 9:33am

We are having a field containing path information like: a>b>c>d
The paths usually have a length of 4 segments.

The field is defined both as a keyword field and analyzed text using the path-tokenizer.
Fielddata is enabled for the analyzed text field as we are running aggregations over the path segments.

I would have expected the size of fielddata for the text field to be at most 4 times that of the keyword field, as each value should result in 4 path entries like [a>b>c>d, a>b>c, a>b, a]

However, when using the cat API to list fielddata, I see about 100 kb for the keyword field (path.untouched), and 400 mb for the text field (path).

So, the ratio is about 4000 times larger.

Can anyone explain?

Here is the relevant excerpt from the mapping. Elasticsearch version is 6.4

"analysis": {
	"analyzer": {
		"path-analyzer": {
			"type": "custom",
			"tokenizer": "path-tokenizer",
			"filter": "lowercase"
		},
	"lowercase-analyzer": {
		"type": "custom",
		"tokenizer": "keyword",
		"filter": "lowercase"
	}
	},
	"tokenizer": {
		"path-tokenizer": {
			"type": "path_hierarchy",
			"delimiter": ">"
		}
	}
}
...
"properties": {
	"path": {
		"type": "text",
		"analyzer": "path-analyzer",
		"search_analyzer": "lowercase-analyzer",
		"fielddata": true,
		"fields": {
			"search": {
				"type": "text",
				"analyzer": "standard"
			},
			"untouched": {
				"type": "keyword"
			}
		}
	},...

cwinkler · September 24, 2019, 1:59pm

Is there nobody that can help me with this excessive memory usage by fielddata?

In a production environment, fielddata occupies about 2gb for that single field.
That really is an issue.

system · October 22, 2019, 1:59pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
High fielddata usage on 2.3.3 Elasticsearch	3	740	July 5, 2017
Fielddata stats Elasticsearch	2	544	February 13, 2020
Performance of keyword vs fielddata Elasticsearch	2	2364	November 20, 2019
_type in FieldData Elasticsearch	2	719	July 5, 2017
Mapping file and the disk space Elasticsearch	3	439	July 14, 2019

Understanding fielddata size

Related topics