Problem with stopword filter, SimpleQueryStringQuery and default operator AND

Hi everybody,

I have a problem with a Simple Query String Query with default operator AND and language-specific stopword filters. I configured the index to apply the stopword filters on the indexed documents (analyzer config parameter) as well as on the search query (search_analyzer config parameter). However it seems that the stopword filter is not applied correctly on the search query if I query on fields with different search analyzers.

I've built a small example that shows the problem:

  1. create an index mytest with a custom template
PUT /mytest

{
  "settings": {
    "analysis": {
      "filter": {
        "english_stop": {
          "type": "stop",
          "stopwords": "_english_"
        },
        "german_stop": {
          "type": "stop",
          "stopwords": "_german_"
        }
      },
      "analyzer": {
        "my_english_analyzer": {
          "tokenizer": "standard",
          "filter": ["english_stop"]
        },
        "my_german_analyzer": {
          "tokenizer": "standard",
          "filter": ["german_stop"]
        }
      }
    }
  },
  "mappings": {
    "_doc": {
      "dynamic": true,
      "dynamic_templates": [
        {
          "english_text": {
            "match": "*_text_en",
            "mapping": {
              "type": "text",
              "analyzer": "my_english_analyzer",
              "search_analyzer": "my_english_analyzer"
            }
          }
        },
        {
          "german_text": {
            "match": "*_text_de",
            "mapping": {
              "type": "text",
              "analyzer": "my_german_analyzer",
              "search_analyzer": "my_german_analyzer"
            }
          }
        }
      ]
    }
  }
}
  1. add two test documents (one analyzed with my_english_analyzer and the other one with my_german_analyzer)
PUT /mytest/_doc/1

{
  "title_text_de": "Frankfurt am Main",
  "content_text_de": "Frankfurt am Main ist die fünftgrößte Stadt Deutschlands."
}
PUT /mytest/_doc/2

{
  "title_text_en": "London",
  "content_text_en": "London is the capital city of the United Kingdom."
}
  1. Now search for "Frankfurt Main" with default operator AND on all fields (*_text_*). The response contains the document with id 1 as expected.
POST /mytest/_search

{
  "query": {
    "simple_query_string" : {
      "query": "Frankfurt Main",
      "fields": ["*_text_*"],
      "default_operator": "and"
    }
  }
}
  1. Search again but add the (German) stopword "am" to the query. Now the response contains no document although I expected to find the document with id 1 again as the stopword "am" should be filtered in the search query. The search result is empty as well when using other German stopwords (e.g. "im", "über" etc) or re-ordering the search terms (e.g. "Frankfurt Main am").
POST /mytest/_search

{
  "query": {
    "simple_query_string" : {
      "query": "Frankfurt am Main",
      "fields": ["*_text_*"],
      "default_operator": "and"
    }
  }
}
  1. Search again with the stopword but limit the fields to the ones analyzed with my_german_analyzer (*_text_de). Now the response contains the Frankfurt document as expected. The same holds for searches with other German stopwords or re-ordering the search terms.
POST /mytest/_search

{
  "query": {
    "simple_query_string" : {
      "query": "Frankfurt am Main",
      "fields": ["*_text_de"],
      "default_operator": "and"
    }
  }
}

I did all tests with a clean Elasticsearch 6.6.1 in the official Elasticsearch Docker container.

Is my configuration or expectation wrong or is this a bug in Elasticsearch?
I did some research on this problem and found the following Github issues that were fixed in Elasticsearch 6.3.x and 6.4.1, respectively. Besides that both seem to be related.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.