How does Elasticsearch treat punctuation marks on index?

QUESTION

To my understanding elasticsearch will treat all punctuation marks as word breaker.

We are looking for recommended strategy on how to deal with the scenarios

  1. where 'BOO' should be not found when end-user search text is 'BOO!'
  2. where 'Men of steel' should not be found when end-user search text is 'x-men'

ANSWER

By default, ES uses the "standard analyzer", which itself uses the "standard tokenizer".

You can create a custom analyzer and configure and/or specify a different tokenizer like the whitespace tokenizer which would preserve all punctuations.

You can test your analyzer using http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/analysis-intro.html#analyze-api