Along with using plugins to help tokenize documents, is there any other way to identify similar words in different script or synonyms other than specifying them in the synonyms.txt?
Hey Shabana,
You can try vector search with our multilingual model, E5 which would understand the semantic layer of synonyms without having to build a list.
This is a great blog to get an overview of how it works (with some examples in korean as well):
Josh also references a previous blog of his in that article, that goes a bit more into using the plugins you described, but with a strategy of also using a language identification model in the pipeline (in case you're dealing with multiple languages in your search - you can automatically pick the best plugin):