When using a whitespace tokenizer the stop words filter doesn't work

Imran_Azad · January 7, 2016, 4:45pm

When using the whitespace tokenizer, the stop words filter doesn't work. Here are the CURL commands to replicate:

    PUT /my_index
    {
      "settings": {
        "index": {
          "number_of_shards": 1,
          "analysis": {
              "analyzer": {
                 "fulltext":{
                "type":"custom",
                "tokenizer":"whitespace",
                "filter": ["english_stop"]
              }
              },
             "filter":{
                "english_stop":{
                   "type":"stop",
                   "stopwords":"_english_"
                }
              }
             }
          }
        }
      }
    }
    GET my_index/_analyze?analyzer=fulltext&text="the drug"

I need to be able to use the whitespace tokenizer because I'm also using the word_delim filter which turns terms like "wi-fi" to wifi, if I use the standard tokenizer, I will lose this ability.

dakrone · January 8, 2016, 10:14pm

I replied here:

with an example, but the stop filter will still work, however, since you have no
lowercase filter only already-lowercase stop words will be removed.

Topic		Replies	Views
Whitespace tokenizer not working as I'd expect Elasticsearch	3	1094	July 6, 2017
Is it possible to create a stopword filter, when using tokenizer = " keyword"? Elasticsearch	1	152	December 24, 2022
Whitespace analyzer (char-filter And token-filter) Elasticsearch	7	1217	November 27, 2019
Stop words are not working Elasticsearch	7	2571	July 5, 2017
Stop word filter problem Elasticsearch	5	383	July 6, 2017

When using a whitespace tokenizer the stop words filter doesn't work

Related topics