As I know we have 3 steps on text field when document with text field is indexed in elasticsearch.
First - Char-filter process,
Second - Tokenizing process,
Third - Token filtering process
I have the following mapping in my index:
"mappings": {
"properties": {
"field1": {
"type": "text",
"analyzer": "whitespace"
},
}
}
So I am wandering what the Whitespace analyzer exactly does on text field on this exact situation?
Is there any Char-filter process by default on field1? As I know, if I don't set it, there will not be any char-filter and first step will be tokenizing process by default,Then tokenizing process only split text by space, After that if I don't set token filter, it will be default just lowercase filter. Is it correct or not ?
Yes, you are exactly right. You can see how the whitespace analyzer has been defined in the documentation. The whitespace analyzer has no character filters, so the first step is the whitespace tokenizer, which breaks strings on whitespace. Finally, there are no token filters. So that's really all that this analyzer does: it breaks strings on whitespace.
No, there are no token filters in this analyzer. If you would like to use the whitespace tokenizer in combination with the lowercase token filter, you would have to create a custom analyzer that combines these two.
Abdon one more question please, It is index time analyzer in my example above, right ? And at the search time, is it necessary to reference which analyzer could be used with that field above?
By default, when you query a field, Elasticsearch will apply the analyzer that's defined in the mapping to the query terms. There is no need to specify the analyzer at search time.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.