When doing a GET request to the following endpoint against all of our Elastic Search (ES) environments to see how the analyzers work:
http://[elastic search endpoint]/_analyze?analyzer=standard&text=eee.fe.Esddasdae.ds64.Cl
It returns 2 tokens:
"eee.fe.Esddasdae.ds64" and "Cl"
The very last ".Cl" is considered a different token than the rest. This should not happen. "." do not need to be escaped in ES as it's not considered to be a special character (which can be found here: https://www.elastic.co/guide/en/elasticsearch/reference/2.3/query-dsl-query-string-query.html#_reserved_characters)
After playing around with this more, i've noticed that if there is a character that is a number before a "." that will cause the string to be tokenised at this point.
So running http://[elastic search endpoint]/_analyze?analyzer=standard&text=eee.fe.Esddasdae.ds.Cl
returns only 1 token which is: eee.fe.Esddasdae.ds.Cl
Why does this happen, and how can we avoid this?
Thank you