Chinese Analyzer: Simplified and Traditional (stconvert)

I'm creating a search engine for my non-profit org that uses 21 languages, so I also need to support traditional and simplified Chinese as these are our second largest set of readers.

I have read a few articles (你们好 - Elasticsearch and the Chinese language | mimacom) and it seems I need to use stconvert as traditional Chinese is not supported in Elasticsearch.

I am also using smartcn for simplified Chinese... these plugins are both on the recommended list (Analysis Plugins | Elasticsearch Plugins and Integrations [8.0] | Elastic) and I have them both installed correctly on elastic cloud.

However, I'm confused about how to use stconvert with smartcn, as I understand it the stconvert plugin only converts traditional Chinese charters to simplified Chinese and back.

I can create the demo stconvert index and get the plugin working:

PUT /stconvert/
{
  "settings": {
    "analysis": {
      "analyzer": {
        "tsconvert": {
          "tokenizer": "tsconvert"
        }
      },
      "tokenizer": {
        "tsconvert": {
          "type": "stconvert",
          "delimiter": "#",
          "keep_both": false,
          "convert_type": "t2s"
        }
      },
      "filter": {
        "tsconvert": {
          "type": "stconvert",
          "delimiter": "#",
          "keep_both": false,
          "convert_type": "t2s"
        }
      },
      "char_filter": {
        "tsconvert": {
          "type": "stconvert",
          "convert_type": "t2s"
        }
      }
    }
  }
}

But when I create the actual traditional Chinese index, I still want to use the smartcn analyser:

PUT /zh-traditional/
{
  "settings": {
    "analysis": {
      "analyzer": "smartcn"
    }
  }
}

Is there anyone that can help me link the two together? I can't find any instructions or help on how to use the two plugins together, or do I just need to use the stconvert analyser only and it'll use the native Chinese search in the background? From my research it seems the best search experience is provided by using both plugins but I can't figure out how to use them together.

Any help would be appreciated.

This is my work-in-progress answer but I'm still doing some testing and trying to understand how it all works:

PUT /zh-traditional/
{
  "settings": {
    "analysis": {
      "analyzer": {
        "rebuilt_smartcn": {
          "tokenizer":  "smartcn_tokenizer",
          "filter": [
            "lowercase",
            "porter_stem",
            "smartcn_stop"
          ],
          "char_filter": [
            "tsconvert"
          ]
        }
      },
      "filter": {
        "tsconvert": {
          "type": "stconvert",
          "keep_both": false,
          "convert_type": "t2s"
        }
      },
      "char_filter": {
        "tsconvert": {
          "type": "stconvert",
          "keep_both": false,
          "convert_type": "t2s"
        }
      }
    }
  }
}

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.