DocValues on Strings Analyzed

(Bill) #1

Is there work being done on adding Strings analyzed to doc_values fields? DocValues helps a lot with speed of Aggs and Sorting.

(Shane Connelly) #2

You may be aware of this, but the string datatype is no longer really around as of 5.0: it's been replaced with text and keyword types. You can find more information about that here. keyword types already default to doc_values set to true, so there are 2 implicit plausible questions:

  • Does it ever make sense to have stemmed, multi-token strings like those values in text fields go into doc_values?
  • Does it ever make sense to apply some analysis to single-token strings like those values in keyword fields and keep them in doc_values?

The answer to the first question is "probably not." In your example, most of the time when you're sorting, you want to sort by the original raw text (e.g. that in a keyword field) rather than on the individual stemmed sub-tokens and the same is generally true of typical aggs usage as well.
However, the answer to the second is "yes!" And in 5.2, we released normalizers for this purpose. That is, you can do some lightweight analysis such as lowercasing or removing accents. There are other questions as to whether we should add other normalizer capabilities (e.g. query-time synonyms), which still haven't been answered.

(Giovanni Caputo) #3

The lowercase for a normalizers lose the case for orginal value.... How to?

(Shane Connelly) #4

Normalizers keep the original value/casing in _source

(Giovanni Caputo) #5

But cannot be used to aggregate.... i can not aggregate by value/casing and order by the aggregation case insensitive.

(Shane Connelly) #6

Can you describe what you're trying to do? You want to have all cases of values returned in an aggregation but have the results of the aggregation sorted case insensitively?

(Giovanni Caputo) #7

Yes... I think that I can not use keyword type..

(system) #8

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.