I'm creating a search engine for my non-profit org that uses 21 languages, so I also need to support traditional and simplified Chinese as these are our second largest set of readers.
I have read a few articles (你们好 - Elasticsearch and the Chinese language | mimacom) and it seems I need to use stconvert
as traditional Chinese is not supported in Elasticsearch.
I am also using smartcn
for simplified Chinese... these plugins are both on the recommended list (Analysis Plugins | Elasticsearch Plugins and Integrations [8.0] | Elastic) and I have them both installed correctly on elastic cloud.
However, I'm confused about how to use stconvert
with smartcn
, as I understand it the stconvert
plugin only converts traditional Chinese charters to simplified Chinese and back.
I can create the demo stconvert
index and get the plugin working:
PUT /stconvert/
{
"settings": {
"analysis": {
"analyzer": {
"tsconvert": {
"tokenizer": "tsconvert"
}
},
"tokenizer": {
"tsconvert": {
"type": "stconvert",
"delimiter": "#",
"keep_both": false,
"convert_type": "t2s"
}
},
"filter": {
"tsconvert": {
"type": "stconvert",
"delimiter": "#",
"keep_both": false,
"convert_type": "t2s"
}
},
"char_filter": {
"tsconvert": {
"type": "stconvert",
"convert_type": "t2s"
}
}
}
}
}
But when I create the actual traditional Chinese index, I still want to use the smartcn
analyser:
PUT /zh-traditional/
{
"settings": {
"analysis": {
"analyzer": "smartcn"
}
}
}
Is there anyone that can help me link the two together? I can't find any instructions or help on how to use the two plugins together, or do I just need to use the stconvert
analyser only and it'll use the native Chinese search in the background? From my research it seems the best search experience is provided by using both plugins but I can't figure out how to use them together.
Any help would be appreciated.