Hi all,
we have a large amount of data, one of its fields is 'email subject' which is 'text' type. Recently I want to do aggregation by this field on a small subset of the data
(assume after the query filter there are 100K hits to aggregate).
so I plan to modify the field to something like:
"Subject": {
"type": "text",
"fields": {
"Raw": {
"type": "keyword"
}
}
}
And then aggregate by Subject.Raw
I know it's ok for normal fields. But as email titles can be pretty long and have a large cardinality, I'm worried about whether using 'keyword' there will cause some kind of memory overflow, etc.
I guess when querying, 100K hits may not be a problem as it's pretty small. But is there any risk when Elasticsearch do things like building index on this long and high-cardinality 'keyword' field and the total amount is huge?
Maybe I can use 'ignore_above' to limit the keyword length... then how about the cardinality?
Thanks a lot!