Compound Word Token Filter Configuration File

Marcos_Cazulo · September 14, 2015, 3:30pm

Hi,

I am trying to switch from having a word_list to a file with the list of words for the compound word token filter.
Right now I have it working defining the filter in the following way:

"compound_word_splitter":{
"type":"dictionary_decompounder",
"min_word_size":4,
"min_subword_size":3,
"word_list": ["icecream","smokehouse","car"]
}

I would like to have a separate file for the word list as the docs.

"compound_word_splitter":{
"type":"dictionary_decompounder",
"min_word_size":4,
"min_subword_size":3,
"word_list_path": "analysis/theWords.txt"
}

I tried having the word list file in the two following formats and the filter does not work:

"icecream","smokehouse","car"
icecream,smokehouse,car

Does anyone have any ideas what I am missing so that the list of words is recognized?
I have analysis/theWords.txt relative the config file.

Thank you.

Marcos_Cazulo · September 14, 2015, 5:36pm

Well after experimenting, it turns out that the word list needs to be delimited by new lines, similar to how it is specified for stop words.

It would be great if this detail was also given for the compound word token filter in the documentation.

warkolm · September 15, 2015, 10:03pm

Thanks for the suggestion, I've raised https://github.com/elastic/elasticsearch/issues/13595 to get that fixed.

Topic		Replies	Views
Multimatch with CROSS_FIELD query and decompounder Elasticsearch	2	409	March 14, 2022
Adding compound word token filter to a template results in “Failed to install template - response code 500 contacting Elasticsearch” Logstash	10	643	August 15, 2019
Dictionary for Compound filter Elasticsearch	1	341	July 6, 2017
Configuring a custom plugin Elasticsearch	6	405	July 6, 2017
Fixed: Phrase Query breaks with “Compound Word Token Filters” Elasticsearch	1	709	October 26, 2018

Compound Word Token Filter Configuration File

Related topics