Multi match query searching on fields not specified

John_D_Ament · September 5, 2015, 2:39am

I've setup index settings on my cluster that include an ngram analyzer. The definition looks like

{
  "index": {
    "analysis": {
      "filter": {
        "ngrams_filter": {
          "type": "nGram",
          "min_gram": 2,
          "max_gram": 20
        }
      },
      "analyzer": {
        "ngrams_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "ngrams_filter"
          ]
        }
      }
    }
  }
}

After adding references to this analyzer in my mappings, my multi match queries began returning weird results. Specifically, I index documents that look like

{
  "foo": {
    "type": "bob",
    "value": "sam"
  }
}

My multimatch is set to query explicitly on "foo.value", when I search on sa/sam it works, however since applying these settings searches for bo/bob return matches as well. If I'm only searching on foo.value I wouldn't expect this.

Any idea?

Igor_Motov · September 5, 2015, 3:04am

Any chance you can show a complete example with mappings, queries and results?

John_D_Ament · September 5, 2015, 3:42pm

Sure. This is what the mapping looks like

"someType": {
  "_id": {
    "path": "objectId"
  },
  "properties": {
    "record": {
      "type": "object",
      "properties": {
        "fields": {
          "type": "object",
          "properties": {
            "Title": {
              "type": "object",
              "properties": {
                "value": {
                  "type": "string",
                  "analyzer": "ngrams_analyzer"
                },
                "className": {
                  "type": "string",
                  "include_in_all": false,
                  "index": "no"
                }
              }
            },
            "comments": {
              "type": "object",
              "properties": {
                "value": {
                  "type": "string",
                  "analyzer": "ngrams_analyzer"
                },
                "className": {
                  "type": "string",
                  "include_in_all": false,
                  "index": "no"
                }
              }
            }
          }
        }
      }
    }
  }
}

"className" is the attribute I referred to previously. My query is

{
  "multi_match": {
    "query" : "FooBar"
    "fields" : [ "record.fields.*.value" ]
    }
}

In this case, FooBar only appears in the className attribute, not the body of title or comments fields. If the explain helps let me know.

John_D_Ament · September 5, 2015, 6:13pm

I just realized after posting this that the scores for this case were all extremely low. It seems like ES is finding partial matches on the search, but nothing concrete. To work around I"m going to set the min score to .5 to see how it goes.

Igor_Motov · September 5, 2015, 7:07pm

So you are searching for FooBar and it give you back bob? Or these are completely unrelated examples? Sorry, I still cannot figure out what you are trying to do and what does and doesn't work. I would be glad to help if I could easily recreate the issue on my machine. Please see https://www.elastic.co/help for some suggestions about how to make your questions easier to understand. Thanks!

John_D_Ament · September 8, 2015, 11:22am

Yes, this is the verbatim search content/terms. It's purposefully stupid sounding, but I did verify the issue exists with this content.

As best as I can tell, we're getting a very low score because a substring of "FooBar" matches against "bob", particular the "ob" parts. I'm not sure if this is the intended behavior for ngrams (to also break up the search term) but this is the only thing I can surmise. I would have expected a 0 score, but if it is breaking up the search term into ngrams as well this makes some sense to me.

Igor_Motov · September 8, 2015, 12:10pm

Ok, now it makes more sense. Indeed, by default the same analyzer is used for both indexing and searching. So, during search the search term will be tokenized into n-grams and because by default mutli_match applies "OR" operator to all tokens it will match any field that has at least one matching n-gram present. In order to solve this problem you need to replace the search analyzer with an analyzer without the ngram filter.

John_D_Ament · September 8, 2015, 2:21pm

How would I specify what analyzer to use for search? I'm assuming that this is against the search term only?

Igor_Motov · September 8, 2015, 2:26pm

Sorry, just realized that the replace link that I posted above was pointing to a wrong page. The search analyzer can be set by using search_analyzer parameters in the field mapping. So in your case it would look like this:

"comments": {
    "type": "object",
    "properties": {
        "value": {
            "type": "string",
            "analyzer": "ngrams_analyzer",
            "search_analyzer": "standard"
        },
        "className": {
            "type": "string",
            "include_in_all": false,
            "index": "no"
        }
    }
}

See the search_analyzer parameter in the string mapping documentation for more information

Topic		Replies	Views
Search on fields with Multi field mapping and Ngram Analyzer Elasticsearch	1	716	July 6, 2017
When I have a multi-match query, what index analyzer gets applied to the query by default? Elasticsearch	2	366	July 2, 2020
Multi_match query and match _all returns different set of results Elasticsearch	1	824	July 6, 2017
Ngram not workign for multivalued field Elasticsearch	1	323	July 6, 2017
Ngram not working for multivalued field Elasticsearch	6	1047	July 6, 2017

Multi match query searching on fields not specified

Related topics