Im setting up a project where I need to have several normalizations for a given token.
For example, the text:
I Don't go
should be result of the queries:
I dont go
I don t go
I don't go
Thus I need the token "Don't" to be normalized to "dont" and "don t".
Stemming is not an option as there are tokens that are invented words (ex. C.O.R.E, should be normalized to CORE and C O R E)
Any idea on how to solve this?