Hyphenation_decompounder tokens do not consider "Operator" in multi-match query

Hi,

I used hyphenation_decompounder(https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-hyp-decomp-tokenfilter.html) for German language and followed the example as mentioned in the documentation. So far so good. it works!. The text kaffeetasse is tokenised into kaffee and tasse.

The concern arose when I built "multi-match" query kaffeetasse to find documents where kaffee AND tasse both matches. It seems that multi-match uses OR for these tokens instead of given Operator in multi-match query. Here is my Test-case

curl -XPUT "http://localhost:9200/testidx" -H 'Content-Type: application/json' -d'{  "settings": {    "index": {      "analysis": {        "analyzer": {          "index": {            "type" : "custom",            "tokenizer": "whitespace",            "filter": [ "lowercase" ]          },          "search": {            "type" : "custom",            "tokenizer": "whitespace",            "filter": [ "lowercase", "hyph" ]          }        },        "filter": {          "hyph": {            "type": "hyphenation_decompounder",            "hyphenation_patterns_path": "analysis/de_DR.xml",            "word_list": ["kaffee", "zucker", "tasse"],            "only_longest_match": true,            "min_subword_size": 4          }        }      }    }  },    "mappings" : {      "properties" : {        "title" : {          "type" : "text",          "analyzer": "index",          "search_analyzer": "search"        },        "description" : {          "type" : "text",          "analyzer": "index",          "search_analyzer": "search"        }      }    }  }' 
curl -XPOST "http://localhost:9200/testidx/_doc/1" -H 'Content-Type: application/json' -d'{  "title" : "Kaffee",  "description": "Milch Kaffee tasse"}' 
curl -XPOST "http://localhost:9200/testidx/_doc/2" -H 'Content-Type: application/json' -d'{  "title" : "Kaffee",  "description": "Latte Kaffee Becher"}' 
curl -XGET "http://localhost:9200/testidx/_search" -H 'Content-Type: application/json' -d'{  "query": {    "multi_match": {      "query": "kaffeetasse",      "fields": ["title", "description"],      "operator": "and",     "type": "cross_fields",     "analyzer": "search"    }  }}'

I expected only document id=1 as it has "kaffee" and "tasse" in their fields but query returns both documents as they contains terms "kaffee" or "tasse" .

At first glance, it seems a bug to me. Any thoughts about that ?

Elasticsearch: 7.9.2
de_DR.xml downloaded from https://sourceforge.net/projects/offo/files/offo-hyphenation/1.2/offo-hyphenation_v1.2.zip/download as mentioned in the documentation.


For non-german speakers
kaffeetasse => coffee cup

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.