The problem was also discussed by @singer and @hbruch
and
Here a example, that the search works not as expected:
Create a index
PUT example/
{
"settings": {
"number_of_shards": 1,
"number_of_replicas": 0,
"analysis": {
"filter": {
"decomp_de": {
"type": "hyphenation_decompounder",
"word_list": ["kaffee", "tasse", "tüte"],
"hyphenation_patterns_path": "hyph/de_DR.xml",
"min_subword_size": 3,
"only_longest_match": true
}
},
"analyzer": {
"german_analyzer": {
"filter": [
"lowercase",
"decomp_de",
"unique"
],
"type": "custom",
"tokenizer": "standard"
}
}
}
},
"mappings": {
"properties": {
"text": {
"type": "text",
"analyzer": "german_analyzer",
"norms": false
},
"type": {
"type": "text",
"analyzer": "german_analyzer",
"norms": false
}
}
}
}
Index documents
{"index":{"_id":1}}
{"text": "Kaffeetasse", "type":"Tasse"}
{"index":{"_id":2}}
{"text": "Kaffeetüte", "type": "Tüte"}
Serach for Kaffeetasse
{
"query": {
"multi_match": {
"query": "Kaffeetasse",
"fields": [
"text",
"type"
],
"type": "cross_fields",
"operator": "and",
"slop": 1,
"prefix_length": 0,
"max_expansions": 50,
"zero_terms_query": "none",
"auto_generate_synonyms_phrase_query": "true",
"fuzzy_transpositions": false,
"boost": 1
}
}
}
The unexpected result and it doesn't matter which type, operator or minmum should match is given is
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 0.6931471,
"hits" : [
{
"_index" : "example",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.6931471,
"_source" : {
"text" : "Kaffeetasse",
"type" : "Tasse"
}
},
{
"_index" : "example",
"_type" : "_doc",
"_id" : "2",
"_score" : 0.25069216,
"_source" : {
"text" : "Kaffeetüte",
"type" : "Tüte"
}
}
]
}
}
The expected result is only the document containing Kaffeetasse
.
Analyzing:
GET example/_analyze
{
"analyzer": "german_analyzer"
, "text": "Kaffeetasse"
}
produces
{
"tokens" : [
{
"token" : "kaffeetasse",
"start_offset" : 0,
"end_offset" : 11,
"type" : "<ALPHANUM>",
"position" : 0
},
{
"token" : "kaffee",
"start_offset" : 0,
"end_offset" : 11,
"type" : "<ALPHANUM>",
"position" : 0
},
{
"token" : "tasse",
"start_offset" : 0,
"end_offset" : 11,
"type" : "<ALPHANUM>",
"position" : 0
}
]
}
And IMHO the query should be rewritten to (text:kaffeetasse OR (text:kaffee AND text: tasse)) OR (type:kaffeetasse OR (type:kaffee AND type: tasse))
and not to (text:kaffeetasse OR text:kaffee OR text: tasse OR type:kaffeetasse OR type:kaffee OR type: tasse)
.
With the _validate/query
GET example/_validate/query?explain=true&rewrite=true
{
"query": {
"multi_match": {
"query": "Kaffeetasse",
"fields": [
"text",
"type"
],
"type": "cross_fields",
"operator": "and",
"slop": 0,
"prefix_length": 0,
"max_expansions": 50,
"minimum_should_match": "-45%",
"zero_terms_query": "NONE",
"auto_generate_synonyms_phrase_query": "false",
"fuzzy_transpositions": false,
"boost": 1
}
}
}
i got
{
"_shards" : {
"total" : 1,
"successful" : 1,
"failed" : 0
},
"valid" : true,
"explanations" : [
{
"index" : "example",
"valid" : true,
"explanation" : "(text:kaffeetasse | text:kaffee | text:tasse | type:kaffeetasse | type:kaffee | type:tasse)"
}
]
}
or without parameter rewrite:
{
"_shards" : {
"total" : 1,
"successful" : 1,
"failed" : 0
},
"valid" : true,
"explanations" : [
{
"index" : "example",
"valid" : true,
"explanation" : "blended(terms:[text:kaffeetasse, text:kaffee, text:tasse, type:kaffeetasse, type:kaffee, type:tasse])"
}
]
}
Searching for Kaffee
should return both documents, searching for Kaffeetasse
or tasse
the document id 1
and searching for Kaffeetüte
or tüte
document with the id 2
or what am I misunderstanding?