Keyword subfield of a text field which has high cardinality

lloadingll · May 7, 2020, 5:42pm

Hi all,
we have a large amount of data, one of its fields is 'email subject' which is 'text' type. Recently I want to do aggregation by this field on a small subset of the data
(assume after the query filter there are 100K hits to aggregate).
so I plan to modify the field to something like:

   "Subject": {
                    "type": "text",
                    "fields": {
                        "Raw": {
                            "type": "keyword"
                        }
                    }
                }

And then aggregate by Subject.Raw
I know it's ok for normal fields. But as email titles can be pretty long and have a large cardinality, I'm worried about whether using 'keyword' there will cause some kind of memory overflow, etc.
I guess when querying, 100K hits may not be a problem as it's pretty small. But is there any risk when Elasticsearch do things like building index on this long and high-cardinality 'keyword' field and the total amount is huge?

Maybe I can use 'ignore_above' to limit the keyword length... then how about the cardinality?

Thanks a lot!

system · June 4, 2020, 5:42pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
What all are the possible problems if I don't give keyword mapping to my text type? Elasticsearch	6	631	December 28, 2017
Resource upper bounds on keyword sorting Elasticsearch	1	341	January 6, 2021
Unable to retreive aggregation for keyword field with high cardinality (Data Table) Elasticsearch	3	464	March 24, 2020
Long keyword/text fields just for retrieval Elasticsearch	6	1370	February 9, 2017
Why elasticsearch still check the length of a keyword field, even if it's not indexed? Elasticsearch	7	6621	July 4, 2018

Keyword subfield of a text field which has high cardinality

Related topics