Vietnamese support


the documentation [1] does mention the Vietnamese analysis plugin:

I already do use the ICU plugin [2] for multi language support.

How much better is the Vietnamese Plugin compared to ICU for Vietnamese language?
I don't speak Vietnamese, so can't judge.

Does someone has experience with both of them?


[1] Analysis Plugins | Elasticsearch Plugins and Integrations [master] | Elastic
[2] ICU Analysis Plugin | Elasticsearch Plugins and Integrations [master] | Elastic

I could answer the question by myself.

There is an example on GitHub - duydo/elasticsearch-analysis-vietnamese: Vietnamese Analysis Plugin for Elasticsearch
"công nghệ thông tin Việt Nam rất phát triển trong những năm gần đây"

It should return:
["công nghệ", "thông tin", "việt nam", "phát triển", "trong", "năm", "gần đây", "."]

ICU does return: "công", "nghệ", "thông", "tin", "việt", "nam" ...

So the vietnamese analyzer is much better.

Hi Jean-Marc,
You are right in the above feedback.
Therefore, we should use the 'specific' analyzer for Vietnamese :slight_smile:

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.