Non romanized lanuages like chinese, korean japanese

Shabana_Rumane · April 4, 2025, 11:18am

Along with using plugins to help tokenize documents, is there any other way to identify similar words in different script or synonyms other than specifying them in the synonyms.txt?

iulia · April 23, 2025, 8:57am

Hey Shabana,

You can try vector search with our multilingual model, E5 which would understand the semantic layer of synonyms without having to build a list.

This is a great blog to get an overview of how it works (with some examples in korean as well):

Josh also references a previous blog of his in that article, that goes a bit more into using the plugins you described, but with a strategy of also using a language identification model in the pipeline (in case you're dealing with multiple languages in your search - you can automatically pick the best plugin):

Topic		Replies	Views
Language Translation API Elasticsearch	4	1668	February 27, 2020
Match with dynamic synonyms with Fasttext vectors Elasticsearch eql-elastic-query-language	1	630	December 24, 2021
Elasticsearch 1.4 - Doesn't match multiwords synonyms exactly Elasticsearch	1	549	July 5, 2017
Can the machine learning section of kibana help me to find words having the same meaning Kibana	8	442	June 9, 2019
Recognition of similar words in a search? Elasticsearch	2	140	August 7, 2023

Non romanized lanuages like chinese, korean japanese

Related topics