DocValues on Strings Analyzed

billnbell · April 11, 2017, 3:09am

Is there work being done on adding Strings analyzed to doc_values fields? DocValues helps a lot with speed of Aggs and Sorting.

https://www.elastic.co/guide/en/elasticsearch/reference/current/doc-values.html

shanec · April 14, 2017, 3:31pm

You may be aware of this, but the string datatype is no longer really around as of 5.0: it's been replaced with text and keyword types. You can find more information about that here. keyword types already default to doc_values set to true, so there are 2 implicit plausible questions:

Does it ever make sense to have stemmed, multi-token strings like those values in text fields go into doc_values?
Does it ever make sense to apply some analysis to single-token strings like those values in keyword fields and keep them in doc_values?

The answer to the first question is "probably not." In your example, most of the time when you're sorting, you want to sort by the original raw text (e.g. that in a keyword field) rather than on the individual stemmed sub-tokens and the same is generally true of typical aggs usage as well.
However, the answer to the second is "yes!" And in 5.2, we released normalizers for this purpose. That is, you can do some lightweight analysis such as lowercasing or removing accents. There are other questions as to whether we should add other normalizer capabilities (e.g. query-time synonyms), which still haven't been answered.

Giovanni_Caputo · April 18, 2017, 10:01am

The lowercase for a normalizers lose the case for orginal value.... How to?

shanec · April 21, 2017, 4:36pm

Normalizers keep the original value/casing in _source

Giovanni_Caputo · April 24, 2017, 7:19am

But cannot be used to aggregate.... i can not aggregate by value/casing and order by the aggregation case insensitive.

shanec · April 24, 2017, 3:58pm

Can you describe what you're trying to do? You want to have all cases of values returned in an aggregation but have the results of the aggregation sorted case insensitively?

Giovanni_Caputo · April 25, 2017, 4:55am

Yes... I think that I can not use keyword type..

system · May 23, 2017, 4:58am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Doc_values all the things! But, what about complex analyzers? Elasticsearch	1	522	July 5, 2017
Geo Features & Doc_Values for Analyzed String Fields Elasticsearch	9	1747	July 5, 2017
Case insensitive search and doc_values Elasticsearch	3	1273	July 5, 2017
Doc values for lowercase string in ES 2.4 Elasticsearch	4	557	March 3, 2017
Understanding doc_values? Elasticsearch	6	601	July 6, 2017

DocValues on Strings Analyzed

Related topics