May an additional english stemmer interfere with other script language?

Dong_Hyun_Kim · January 11, 2016, 5:19am

Hello everyone,

I currently test indexing mult-language documents with combining ES basic analyze component. I want use additional english stemmer at korean, japanesem chinese analyzer for getting more wide search result of english. (english word appears in high probability all document)
have you ever used [different from english script tokenizer, tokenfilters] + [english stemmer] combination?
I tested some, and found no side-effect. It seems much distinguishable script and differ in unicode block.

please share your experiences.

thank you.

testing setting likes this,

"korean_english": {
"filter": [
"trim",
"arirang_filter", (custom opensource)
"decompounder",
"delimiter",
"lowercase",
"english_stop",
"english_stemmer"
],
"tokenizer": "arirang_tokenizer" (custom opensource)
},

"japanese_filter" :{ "type":"custom", "tokenizer" : "kuromoji_tokenizer", (custom opensource) "filter" : [ "kuromoji_baseform", "kuromoji_part_of_speech", "cjk_width", "stop", "english_stop", "delimiter", "kuromoji_stemmer", "english_stemmer", "lowercase" ] }

"delimiter" :{ "type":"word_delimiter", "catenate_all" : true, "type_table_path" : "delimiterType.json", "type_table" : true, "split_on_numerics" : false },

"english_stemmer": { "type": "stemmer", "language": "english" }, "english_stop": { "type": "stop", "stopwords": "_english_" },

Topic		Replies	Views
Stemming Capability for English+Arabic Content Elasticsearch	9	1820	July 6, 2017
Stemming on multiple language index Elasticsearch	2	1447	July 5, 2017
It is possible to use other stemmers for nonenglish languages? Elasticsearch	0	92	April 15, 2024
Tweak ES "documents" with non-Latin text before they are stemmed? Elasticsearch	0	97	May 2, 2024
Update default stemmer settings Elasticsearch	1	504	April 26, 2018

May an additional english stemmer interfere with other script language?

Related topics