When creating an analyzer, difference between stopwords and filter:"stop"?

Hello,

I want to create my own analyzer and I don't understand the difference between "filter": "stop" and the options "stopwords" which seem to be overlapping to me. Are they doing the same thing?

Examples from doc:

What is the difference between both?

Thank you,
Yoann

Hi @Yoann_Buzenet ,

In Elasticsearch, "filter": "stop" and "stopwords" are used to remove common, unimportant words like "the", "and", "a" during text analysis. "filter": "stop" is like a special tool you create to filter out these words, and you can use it in multiple places. On the other hand, "stopwords" is a direct list of words you tell Elasticsearch to ignore. Think of "filter": "stop" as a reusable coffee filter you can use with many cups, while "stopwords" is a specific list of words for one particular cup.

Simple Example:

PUT /my_index
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "my_stop_filter"
          ]
        }
      },
      "filter": {
        "my_stop_filter": {
          "type": "stop",
          "stopwords": ["the", "and", "a"]
        }
      }
    }
  }
}

In this example, we create an analyzer called my_analyzer that uses a filter named my_stop_filter to remove the words "the", "and", and "a" from the text during indexing.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.