Multi match query searching on fields not specified


(John D. Ament) #1

I've setup index settings on my cluster that include an ngram analyzer. The definition looks like

{
  "index": {
    "analysis": {
      "filter": {
        "ngrams_filter": {
          "type": "nGram",
          "min_gram": 2,
          "max_gram": 20
        }
      },
      "analyzer": {
        "ngrams_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "ngrams_filter"
          ]
        }
      }
    }
  }
}

After adding references to this analyzer in my mappings, my multi match queries began returning weird results. Specifically, I index documents that look like

{
  "foo": {
    "type": "bob",
    "value": "sam"
  }
}

My multimatch is set to query explicitly on "foo.value", when I search on sa/sam it works, however since applying these settings searches for bo/bob return matches as well. If I'm only searching on foo.value I wouldn't expect this.

Any idea?


(Igor Motov) #2

Any chance you can show a complete example with mappings, queries and results?


(John D. Ament) #3

Sure. This is what the mapping looks like

"someType": {
  "_id": {
    "path": "objectId"
  },
  "properties": {
    "record": {
      "type": "object",
      "properties": {
        "fields": {
          "type": "object",
          "properties": {
            "Title": {
              "type": "object",
              "properties": {
                "value": {
                  "type": "string",
                  "analyzer": "ngrams_analyzer"
                },
                "className": {
                  "type": "string",
                  "include_in_all": false,
                  "index": "no"
                }
              }
            },
            "comments": {
              "type": "object",
              "properties": {
                "value": {
                  "type": "string",
                  "analyzer": "ngrams_analyzer"
                },
                "className": {
                  "type": "string",
                  "include_in_all": false,
                  "index": "no"
                }
              }
            }
          }
        }
      }
    }
  }
}

"className" is the attribute I referred to previously. My query is

{
  "multi_match": {
    "query" : "FooBar"
    "fields" : [ "record.fields.*.value" ]
    }
}

In this case, FooBar only appears in the className attribute, not the body of title or comments fields. If the explain helps let me know.


(John D. Ament) #4

I just realized after posting this that the scores for this case were all extremely low. It seems like ES is finding partial matches on the search, but nothing concrete. To work around I"m going to set the min score to .5 to see how it goes.


(Igor Motov) #5

So you are searching for FooBar and it give you back bob? Or these are completely unrelated examples? Sorry, I still cannot figure out what you are trying to do and what does and doesn't work. I would be glad to help if I could easily recreate the issue on my machine. Please see https://www.elastic.co/help for some suggestions about how to make your questions easier to understand. Thanks!


(John D. Ament) #6

Yes, this is the verbatim search content/terms. It's purposefully stupid sounding, but I did verify the issue exists with this content.

As best as I can tell, we're getting a very low score because a substring of "FooBar" matches against "bob", particular the "ob" parts. I'm not sure if this is the intended behavior for ngrams (to also break up the search term) but this is the only thing I can surmise. I would have expected a 0 score, but if it is breaking up the search term into ngrams as well this makes some sense to me.


(Igor Motov) #7

Ok, now it makes more sense. Indeed, by default the same analyzer is used for both indexing and searching. So, during search the search term will be tokenized into n-grams and because by default mutli_match applies "OR" operator to all tokens it will match any field that has at least one matching n-gram present. In order to solve this problem you need to replace the search analyzer with an analyzer without the ngram filter.


(John D. Ament) #8

How would I specify what analyzer to use for search? I'm assuming that this is against the search term only?


(Igor Motov) #9

Sorry, just realized that the replace link that I posted above was pointing to a wrong page. The search analyzer can be set by using search_analyzer parameters in the field mapping. So in your case it would look like this:

"comments": {
    "type": "object",
    "properties": {
        "value": {
            "type": "string",
            "analyzer": "ngrams_analyzer",
            "search_analyzer": "standard"
        },
        "className": {
            "type": "string",
            "include_in_all": false,
            "index": "no"
        }
    }
}

See the search_analyzer parameter in the string mapping documentation for more information


(system) #10