Is KEYWORD data type analyzed as well?

kasia · January 17, 2017, 10:52am

Hi,
I thought inverted indexes are being created for all the fields, independently on their data types, but only TEXT fields are being analyzed before indexing... but now, after reading the documentation (Reference 5.1) about the truncate token filter, I'm pretty confused...

This is what the REFERENCE 5.1 says:

The truncate token filter can be used to truncate tokens into a specific length.
This can come in handy with keyword (single token) based mapped fields that are used for sorting in order to reduce memory usage.

The truncate token filter, as any token filter, is supposed to be part of an analyzer, right?
Does it mean that keyword mapped fields are being analyzed as well?
If so, where do I set an analyzer? The keyword mapping only allows search_analyzer to be set...

Please, explain how the internals look like?
Are there inverted indexes being created for each field marked as indexed (index=true) in the mapping, indeed?
What's the type of the entries of the inverted index if this is a DATE or any-NUMBER field that has been indexed?

Thanks in advance,
Kasia

colings86 · January 17, 2017, 11:03am

keyword fields in 5.0 are effectively analyzed using the keyword analyzer which takes the value of the field and create a single token for the index whose text is the value of the field. At the moment, this behavbiour cannot be changed, but we do have an issue for adding the ability to set a "normaliser" to keyword fields to allow some customisation on how the token is created (such as lowercasing the value). The limitation is that the "normaliser" should always result in a single token for a given field value. the issue is here: https://github.com/elastic/elasticsearch/pull/21919

The part of the documentation is admittedly confusing since it is mixing the keyword analyzer (which can be used on text fields) with keyword field types on which you can't specify an analyzer. I have opened https://github.com/elastic/elasticsearch/issues/22650 to correct the wording here

kasia · January 17, 2017, 11:40am

Hi Collin,
Thank's for your explanation.
If I understood well, you'e using keyword analyzer for keyword fields internally, but the keyword analyzer itself cannot be configured, right? To add a normalizer, I would need to configure a customized analyzer using a keyword tokenizer and some normalizing token filters... and in such a case an "analyzer" parameter would have to be enabled for the keyword fields and this is what you are working on, right?

Nonetheless, I still don't get the example of when the truncate token filter could be useful.
The documentation mentions sorting, but... shouldn't I rather use the doc_values if I were to sort on a keyword field? Or maybe you mean, that if I persist for some reason in sorting on a text field, truncating tokens would be helpful (but harmfull, I suppose, for the search case...), don't you?

Best regards,
Kasia

system · February 14, 2017, 11:41am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How are keywords indexed Elasticsearch	2	392	July 22, 2019
Using the Truncate filter on keywords Elasticsearch	6	2821	December 11, 2018
Analyze on text field Elasticsearch	7	605	June 13, 2018
Cannot use custom analyzer for keyword Elasticsearch	4	4274	September 3, 2017
Help understanding keyword vs not_analyzed Elasticsearch	4	8795	July 6, 2017

Is KEYWORD data type analyzed as well?

Related topics