How does Elasticsearch treat punctuation marks on index?

shaunak · May 1, 2015, 10:14am

QUESTION

To my understanding elasticsearch will treat all punctuation marks as word breaker.

We are looking for recommended strategy on how to deal with the scenarios

where 'BOO' should be not found when end-user search text is 'BOO!'
where 'Men of steel' should not be found when end-user search text is 'x-men'

ANSWER

By default, ES uses the "standard analyzer", which itself uses the "standard tokenizer".

You can create a custom analyzer and configure and/or specify a different tokenizer like the whitespace tokenizer which would preserve all punctuations.

You can test your analyzer using http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/analysis-intro.html#analyze-api

Topic		Replies	Views
Handling Punctuation in multi_match query Elasticsearch	2	1007	October 6, 2019
Special characters search in elastic search Elasticsearch	6	489	July 6, 2017
Whole word search with elasticsearch Elasticsearch	2	2043	July 5, 2017
ES Plugin to extend Lucene's Standard Tokenizer Elasticsearch	5	860	July 6, 2017
I have a question about Elastic Search Elasticsearch	2	303	July 14, 2020

How does Elasticsearch treat punctuation marks on index?

QUESTION

ANSWER

Related topics