Resource upper bounds on keyword sorting

rex-remind · December 9, 2020, 8:47am

I was reading through the docs on fielddata and came about this:

It usually doesn’t make sense to enable fielddata on text fields. Field data is stored in the heap with the field data cache because it is expensive to calculate. Calculating the field data can cause latency spikes, and increasing heap usage is a cause of cluster performance issues.

Most users who want to do more with text fields use multi-field mappings by having both a text field for full text searches, and an unanalyzed keyword field for aggregations, as follows:

I then wanted to explore more why keyword would be more performant and came across this topic which explains how it uses the disk and filesystem cache instead of the heap.

My question is, is there still a limit to the cardinality of a keyword field even given its backing by disk and fs cache rather than heap?

E.g. How would sorting by a keyword field perform on a simple query if there are 1M keywords of 32 byte length? 1 Billion? 1 Trillion? What's the upper limit and in what ways would Elasticsearch be bottle-necked?

Thanks

system · January 6, 2021, 8:47am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Keyword subfield of a text field which has high cardinality Elasticsearch	1	666	June 4, 2020
Performance of keyword vs fielddata Elasticsearch	2	2455	November 20, 2019
Fielddata: use or not to use Elasticsearch	4	823	February 14, 2017
What's the right way to implement large text fields? Elasticsearch	2	1015	March 6, 2017
Yet another facet/memory question Elasticsearch	2	363	July 6, 2017

Resource upper bounds on keyword sorting

Related topics