Stop token filter how to set whitespace char as stop word in stopwords_path config file

chenchuangc · August 25, 2021, 2:59am

Elasticsearch 7.5
thank you for help me !

i want to set the stop word by the stopwords_path , i set a lot word in the stopwords.txt file , most of the word in the file is effective but only the whitespace char is not effective

PUT  jieba_test
{
  "settings": {
    "analysis": {
      "filter": {
        "jieba_stop": {
          "type":        "stop",
          "stopwords_path": "stopwords.txt"
        }
      },
      "analyzer": {
        "my_ana": {
          "tokenizer": "jieba_index",
          "filter": [
            "lowercase",
            "jieba_stop"
          ]
        }
      }
    }
  }
}

when i change the setting by stopwords setting like below ,it is ok

PUT  jieba_test02
{
  "settings": {
    "analysis": {
      "filter": {
        "jieba_stop": {
          "type":        "stop",
          "stopwords": [ " ", "is", "the" ]
        }
      },
      "analyzer": {
        "my_ana": {
          "tokenizer": "jieba_index",
          "filter": [
            "lowercase",
            "jieba_stop"
          ]
        }
      }
    }
  }
}

i want to know , how could i setting whitespace char as stop token in the config file sorry for bother you

spinscale · August 25, 2021, 9:36am

from a quick look in the source this does not seem to be supported. The code to load stop words from file is ultimately this one here elasticsearch/Analysis.java at master · elastic/elasticsearch · GitHub

this calls

if (Strings.hasText(word) == false) {
  continue;
}

so if an empty line is encountered (lines that contain only whitespace are also considered empty), that line is skipped.

chenchuangc · September 14, 2021, 1:02pm

thank you so much

system · October 12, 2021, 1:03pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Stop token filter how to set whitespace char as stop word in stopwords_path config file Kibana	3	277	September 22, 2021
When using a whitespace tokenizer the stop words filter doesn't work Elasticsearch	2	676	July 5, 2017
Stop word filter problem Elasticsearch	5	383	July 6, 2017
Stopword_path config file Elasticsearch	1	372	March 17, 2017
Whitespace analyzer (char-filter And token-filter) Elasticsearch	7	1217	November 27, 2019

Stop token filter how to set whitespace char as stop word in stopwords_path config file

Related topics