Stop token filter how to set whitespace char as stop word in stopwords_path config file

Elasticsearch 7.5
thank you for help me !

i want to set the stop word by the stopwords_path , i set a lot word in the stopwords.txt file , most of the word in the file is effective but only the whitespace char is not effective

PUT  jieba_test
{
  "settings": {
    "analysis": {
      "filter": {
        "jieba_stop": {
          "type":        "stop",
          "stopwords_path": "stopwords.txt"
        }
      },
      "analyzer": {
        "my_ana": {
          "tokenizer": "jieba_index",
          "filter": [
            "lowercase",
            "jieba_stop"
          ]
        }
      }
    }
  }
}

when i change the setting by stopwords setting like below ,it is ok

PUT  jieba_test02
{
  "settings": {
    "analysis": {
      "filter": {
        "jieba_stop": {
          "type":        "stop",
          "stopwords": [ " ", "is", "the" ]
        }
      },
      "analyzer": {
        "my_ana": {
          "tokenizer": "jieba_index",
          "filter": [
            "lowercase",
            "jieba_stop"
          ]
        }
      }
    }
  }
}

i want to know , how could i setting whitespace char as stop token in the config file sorry for bother you

from a quick look in the source this does not seem to be supported. The code to load stop words from file is ultimately this one here elasticsearch/Analysis.java at master · elastic/elasticsearch · GitHub

this calls

if (Strings.hasText(word) == false) {
  continue;
}

so if an empty line is encountered (lines that contain only whitespace are also considered empty), that line is skipped.

1 Like

thank you so much :heartpulse:

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.