Hi everybody,
I have a problem with a Simple Query String Query with default operator AND and language-specific stopword filters. I configured the index to apply the stopword filters on the indexed documents (analyzer config parameter) as well as on the search query (search_analyzer config parameter). However it seems that the stopword filter is not applied correctly on the search query if I query on fields with different search analyzers.
I've built a small example that shows the problem:
- create an index
mytest
with a custom template
PUT /mytest
{
"settings": {
"analysis": {
"filter": {
"english_stop": {
"type": "stop",
"stopwords": "_english_"
},
"german_stop": {
"type": "stop",
"stopwords": "_german_"
}
},
"analyzer": {
"my_english_analyzer": {
"tokenizer": "standard",
"filter": ["english_stop"]
},
"my_german_analyzer": {
"tokenizer": "standard",
"filter": ["german_stop"]
}
}
}
},
"mappings": {
"_doc": {
"dynamic": true,
"dynamic_templates": [
{
"english_text": {
"match": "*_text_en",
"mapping": {
"type": "text",
"analyzer": "my_english_analyzer",
"search_analyzer": "my_english_analyzer"
}
}
},
{
"german_text": {
"match": "*_text_de",
"mapping": {
"type": "text",
"analyzer": "my_german_analyzer",
"search_analyzer": "my_german_analyzer"
}
}
}
]
}
}
}
- add two test documents (one analyzed with
my_english_analyzer
and the other one withmy_german_analyzer
)
PUT /mytest/_doc/1
{
"title_text_de": "Frankfurt am Main",
"content_text_de": "Frankfurt am Main ist die fünftgrößte Stadt Deutschlands."
}
PUT /mytest/_doc/2
{
"title_text_en": "London",
"content_text_en": "London is the capital city of the United Kingdom."
}
- Now search for "Frankfurt Main" with default operator AND on all fields (
*_text_*
). The response contains the document with id 1 as expected.
POST /mytest/_search
{
"query": {
"simple_query_string" : {
"query": "Frankfurt Main",
"fields": ["*_text_*"],
"default_operator": "and"
}
}
}
- Search again but add the (German) stopword "am" to the query. Now the response contains no document although I expected to find the document with id 1 again as the stopword "am" should be filtered in the search query. The search result is empty as well when using other German stopwords (e.g. "im", "über" etc) or re-ordering the search terms (e.g. "Frankfurt Main am").
POST /mytest/_search
{
"query": {
"simple_query_string" : {
"query": "Frankfurt am Main",
"fields": ["*_text_*"],
"default_operator": "and"
}
}
}
- Search again with the stopword but limit the fields to the ones analyzed with
my_german_analyzer
(*_text_de
). Now the response contains the Frankfurt document as expected. The same holds for searches with other German stopwords or re-ordering the search terms.
POST /mytest/_search
{
"query": {
"simple_query_string" : {
"query": "Frankfurt am Main",
"fields": ["*_text_de"],
"default_operator": "and"
}
}
}
I did all tests with a clean Elasticsearch 6.6.1 in the official Elasticsearch Docker container.
Is my configuration or expectation wrong or is this a bug in Elasticsearch?
I did some research on this problem and found the following Github issues that were fixed in Elasticsearch 6.3.x and 6.4.1, respectively. Besides that both seem to be related.