Hi,
the documentation [1] does mention the Vietnamese analysis plugin:
I already do use the ICU plugin [2] for multi language support.
How much better is the Vietnamese Plugin compared to ICU for Vietnamese language?
I don't speak Vietnamese, so can't judge.
Does someone has experience with both of them?
tx,
Jean-Marc
[1] Analysis Plugins | Elasticsearch Plugins and Integrations [master] | Elastic
[2] ICU Analysis Plugin | Elasticsearch Plugins and Integrations [master] | Elastic
I could answer the question by myself.
There is an example on GitHub - duydo/elasticsearch-analysis-vietnamese: Vietnamese Analysis Plugin for Elasticsearch
"công nghệ thông tin Việt Nam rất phát triển trong những năm gần đây"
It should return:
["công nghệ", "thông tin", "việt nam", "phát triển", "trong", "năm", "gần đây", "."]
ICU does return: "công", "nghệ", "thông", "tin", "việt", "nam" ...
So the vietnamese analyzer is much better.
letien
(Tien)
June 3, 2022, 10:50am
3
Hi Jean-Marc,
You are right in the above feedback.
Therefore, we should use the 'specific' analyzer for Vietnamese
Cheers,
Tien
1 Like
system
(system)
Closed
July 1, 2022, 10:51am
4
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.