Query does not work without specifying analyzers


(Alfred Bez) #1

Hi guys, I've a problem and I'm stuck:

We have a page called 'Jobs und Karriere' (german) and when I search for 'Jobs' I get the expected result, but searching for 'Karriere' doesn't find anything.
Here's my query:

# Query
{
    "query": {
        "bool": {
            "must": [{
                "match": {
                    "_all": {
                        "query": "karriere",
                        "operator": "and",
                        "boost": 2,
                        "fuzziness": 1
                    }
                }
            }, {
                "match": {
                    "type": {
                        "query": "pages"
                    }
                }
            }],
            "should": [{
                "match": {
                    "title": {
                        "query": "karriere",
                        "boost": 20,
                        "fuzziness": 1
                    }
                }
            }, {
                "match": {
                    "content": {
                        "query": "karriere",
                        "fuzziness": 1
                    }
                }
            }],
            "minimum_should_match": 1
        }
    },
    "from": 0,
    "size": 4,
    "_source": ["title", "url", "content", "created", "category"]
}

I tried to debug the query against the matching document-id (with the explain-API) with no luck. Then I tried to debug the problem locally (I was working on staging before) and I got the expected result for both queries, so there must be some difference, maybe someone can help me or point me in the right direction.

I get results (even for staging) when I set the analyzer directly in my query like so:

query -> bool -> must -> match -> _all -> "analyzer": "EdgenGram_analyzer"
query -> bool -> should -> match -> title -> "analyzer: "nGram_analyzer"

What I don't understand: nGram_analyzer is already set for the title field and EdgenGram_analyzer for _all (see mappings below), is it possible that they are overwritten at some point?

Additional information:
# (shortened) Mapping (staging)
{
    "state": "open",
    "settings": {
        "index": {
            "number_of_shards": "5",
            "provided_name": "my_index",
            "creation_date": "1531310116997",
            "analysis": {
                "filter": {
                    // ...
                },
                "analyzer": {
                    "nGram_analyzer": {
                        "filter": ["nGram_filter", "lowercase", "projectname_stopwords", "projectname_stemmer"],
                        "char_filter": ["html_filter"],
                        "type": "custom",
                        "tokenizer": "whitespace"
                    },
                    "EdgenGram_analyzer": {
                        "filter": ["EdgenGram_filter", "lowercase", "projectname_stopwords", "projectname_stemmer"],
                        "char_filter": ["html_filter"],
                        "type": "custom",
                        "tokenizer": "whitespace"
                    },
                    // ...
                },
                // ...
            },
            "number_of_replicas": "1",
            "uuid": "j2mKZeGlR3OLOQ2V_hDK_g",
            "version": {
                "created": "5040399"
            }
        }
    },
    "mappings": {
        "documents": {
            "_all": {
                "search_analyzer": "simple_search",
                "analyzer": "EdgenGram_analyzer",
                "enabled": true
            },
            "properties": {
                // ...
                "title": {
                    "search_analyzer": "simple_search",
                    "analyzer": "nGram_analyzer",
                    "type": "text"
                },
                "type": {
                    "type": "keyword"
                },
                // ...
            }
        }
    },
    // ...
}

# Mapping diff
$ curl -sS -XGET 1.2.3.4:9200/my_index/_mapping/documents?pretty > server.mapping.json
$ curl -sS -XGET localhost:9200/my_index/_mapping/documents?pretty > local.mapping.json
$ diff -c3 local.mapping.json server.mapping.json
*** local.mapping.json  2018-11-16 10:48:57.391550907 +0100
--- server.mapping.json 2018-11-16 10:48:42.698965949 +0100
***************
*** 3,8 ****
--- 3,9 ----
      "mappings" : {
        "documents" : {
          "_all" : {
+           "enabled" : true,
            "analyzer" : "EdgenGram_analyzer",
            "search_analyzer" : "simple_search"
          },