Whitespace analyzer (char-filter And token-filter)

Irakli · October 29, 2019, 12:33pm

Hi there,

As I know we have 3 steps on text field when document with text field is indexed in elasticsearch.
First - Char-filter process,
Second - Tokenizing process,
Third - Token filtering process

I have the following mapping in my index:
"mappings": {
"properties": {
"field1": {
"type": "text",
"analyzer": "whitespace"
},
}
}

So I am wandering what the Whitespace analyzer exactly does on text field on this exact situation?
Is there any Char-filter process by default on field1? As I know, if I don't set it, there will not be any char-filter and first step will be tokenizing process by default,Then tokenizing process only split text by space, After that if I don't set token filter, it will be default just lowercase filter. Is it correct or not ?

abdon · October 29, 2019, 1:00pm

Yes, you are exactly right. You can see how the whitespace analyzer has been defined in the documentation. The whitespace analyzer has no character filters, so the first step is the whitespace tokenizer, which breaks strings on whitespace. Finally, there are no token filters. So that's really all that this analyzer does: it breaks strings on whitespace.

Irakli · October 29, 2019, 1:04pm

Thanks such a quick answer, But what about token-filter ? isn't there any lowercase token filter after tokenizing process ?

abdon · October 29, 2019, 1:06pm

No, there are no token filters in this analyzer. If you would like to use the whitespace tokenizer in combination with the lowercase token filter, you would have to create a custom analyzer that combines these two.

Irakli · October 29, 2019, 1:09pm

Thank you Abdon

Irakli · October 29, 2019, 4:07pm

Abdon one more question please, It is index time analyzer in my example above, right ? And at the search time, is it necessary to reference which analyzer could be used with that field above?

abdon · October 30, 2019, 12:12pm

By default, when you query a field, Elasticsearch will apply the analyzer that's defined in the mapping to the query terms. There is no need to specify the analyzer at search time.

system · November 27, 2019, 12:12pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Whitespace tokenizer not working as I'd expect Elasticsearch	3	1094	July 6, 2017
Whitespace tokenizer doesn't allow lowercase search? Elasticsearch	2	2992	October 4, 2017
Standard analyzer Elasticsearch	6	325	June 6, 2019
Keyword analyzer but allow redundant white spaces Elasticsearch	3	4092	January 15, 2018
First word not get search with whitespace analyzer Elasticsearch	2	462	July 6, 2017

Whitespace analyzer (char-filter And token-filter)

Related topics