When using a whitespace tokenizer the stop words filter doesn't work


(Imran Azad) #1

When using the whitespace tokenizer, the stop words filter doesn't work. Here are the CURL commands to replicate:

    PUT /my_index
    {
      "settings": {
        "index": {
          "number_of_shards": 1,
          "analysis": {
              "analyzer": {
                 "fulltext":{
                "type":"custom",
                "tokenizer":"whitespace",
                "filter": ["english_stop"]
              }
              },
             "filter":{
                "english_stop":{
                   "type":"stop",
                   "stopwords":"_english_"
                }
              }
             }
          }
        }
      }
    }
    GET my_index/_analyze?analyzer=fulltext&text="the drug"

I need to be able to use the whitespace tokenizer because I'm also using the word_delim filter which turns terms like "wi-fi" to wifi, if I use the standard tokenizer, I will lose this ability.


(Lee Hinman) #2

I replied here:


with an example, but the stop filter will still work, however, since you have no
lowercase filter only already-lowercase stop words will be removed.


(system) #3