Use latest NLPIR/ICTCLAS for Chinese Word Segmentation

My understanding is that the Smart Chinese Analysis plugin (analysis-smartcn) uses the underlying Lucene plugin org.apache.lucene.analysis.cn.smart which, according to its documentation, is based on dictionary data from ICTCLAS1.0
ICTCLAS has had many updates since. An updated Lucene plugin is available here: https://github.com/NLPIR-team/nlpir-analysis-cn-ictclas

My question is: Is it possible to update the underlying Lucene plugin in a way that the analysis-smartcn plugin would benefit from the updated dictionaries?

I can't comment with authority on this area, but if no one else does then I'd recommend creating an issue on GitHub about this as it seems like a pretty good feature request :slight_smile:

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.