If, for example, I have a field in which I know that the data ingested to it is only one-word strings. Why would I still use type "text" and not type "keyword" for this field's mapping?
Setting it to be of type "keyword" will both allow me to search it (since it's a one-word, the results will be the same, compared to the "text" type), use it in filters, and use it for aggregations.
I'm trying to understand if there's a catch here.
How does the data ingested if the mapping set to multi-field "text" and "keyword"? does it get ingested twice, even if it's a one-word? Once for the text and once for the keyword?
Yes but only for exact match unless you apply also a lowercase normalizer or may be an asciifolding normalized...
But you're right, if you need to do sorting, aggregation and exact match search, then using a keyword is the good option.
Is there a catch there?
Does using the keyword with a lowercase normalizer will be the same as using the text type? (when speaking about one-words, of course)
I'm trying to understand if there will be an improvement in terms of performance/storage.
I'm asking because you said that it does get indexed twice when setting the field's mapping to multi-field (text + keyword).
I wonder if the fact that I choose to index a field only as keyword and not as text as well will save me some disk storage space (or whether the performance will improve).
Additionally, I've been wondering, if I set a field to be of type only keyword, will it appear in the "Available Fields" (the nav bar on the side, please see the screenshot below).
ill there be a difference in terms of performance? Or not something significant?
Because I want to append this normalizer to each only-keyword field I have in my mapping file and there are many of them; so I wonder whether that matters.
I believe the performance is pretty much the same. The biggest difference that jumps to my mind is that keyword gets doc values and text can't. So keyword will take up more disk space for the doc values than text in this case. You can turn off doc values and then everything out to be the same again.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.