Is there work being done on adding Strings analyzed to doc_values fields? DocValues helps a lot with speed of Aggs and Sorting.
You may be aware of this, but the
string datatype is no longer really around as of 5.0: it's been replaced with
keyword types. You can find more information about that here.
keyword types already default to
doc_values set to
true, so there are 2 implicit plausible questions:
- Does it ever make sense to have stemmed, multi-token strings like those values in
textfields go into
- Does it ever make sense to apply some analysis to single-token strings like those values in
keywordfields and keep them in
The answer to the first question is "probably not." In your example, most of the time when you're sorting, you want to sort by the original raw text (e.g. that in a
keyword field) rather than on the individual stemmed sub-tokens and the same is generally true of typical aggs usage as well.
However, the answer to the second is "yes!" And in 5.2, we released normalizers for this purpose. That is, you can do some lightweight analysis such as lowercasing or removing accents. There are other questions as to whether we should add other normalizer capabilities (e.g. query-time synonyms), which still haven't been answered.
The lowercase for a normalizers lose the case for orginal value.... How to?
Normalizers keep the original value/casing in
But cannot be used to aggregate.... i can not aggregate by value/casing and order by the aggregation case insensitive.
Can you describe what you're trying to do? You want to have all cases of values returned in an aggregation but have the results of the aggregation sorted case insensitively?
Yes... I think that I can not use keyword type..
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.