How can i apply minimum-should-match to a phonetic match query?

we have a field in our index that is analyzed with the Beider-Morse phonetic filter and when we query this field we sometimes get some very strange matches.

For example, if you search for "Heine" you will find "Chatten" as a phonetic match. I believe there isn't a serious argument to be made that those two are phonetically similar, no matter what language you consider.

The reason this is considered a match seems to lie in the phonetic synonyms that the original terms are transformed into. Both "Heine" and "Chatten" are transformed into around a dozen phonetic synonyms and there is only one overlap, one synonym that is assigned to both (the synonym "xan"). So, 1 out of 12 is not a really good match.

I don't have the expertise to determine if the transformation into the synonyms makes sense or not. Thats why my first instinct was to "solve" this problem by introducing a minimum-should-match clause, with the intention that it should not be enough for a single synonym to match. i planned to play around with some values to get a feel what would be a good compromise.

But i didn't get that far, because minimum-should-match doesn't seem to work with a phonetic match query.

this is what my query looks like. there are usually a lot more subqueries for other fields that i removed for the sake of clarity/simplicity, thats why there is a nested bool-query that seems obsolete in this simplified example, just so you know:

{
  "query": {
    "bool": {
      "filter": [
        {
          "term": {
            "company": {
              "value": "0"
            }
          }
        },
        {
          "term": {
            "accountNo": {
              "value": "80529335"
            }
          }
        }
      ],
      "should": [
        {
          "bool": {
            "should": [
              {
                "match": {
                  "address.street": {
                    "query": "Heinestr.",
                    "minimum_should_match": "3<75%"
                  }
                }
              }
            ],
            "minimum_should_match": "1"
          }
        }
      ],
      "minimum_should_match": "100%"
    }
  }
}

i tried every conceivable value for "minimum_should_match": "3<75%", but it doesn't seem to have any impact on the result at all as far as i can tell.

my expectation would have been that when setting this to a value >1 the match of a single synonym would no longer be enough to get a match.

any ideas how i could achieve this?

thanks in advance!

regards
Mario K.

So, in the meantime in learned something.

It seems that the minimum_should_match does not apply to the number of phonetic synonyms that represent the original term, but to the number of original search terms instead.

so, for example, if I search for "Quick Brown Fox" with a minimum_should_match="3<75%", then that means that "Quick", "Brown" and "Fox" all need to have phonetic matches in an indexed document for it to become a hit ("3<75%" means because there are 3 or less terms, all of them have to match).

previously my understanding was that - similar to an NGram/Trigram analyzer - that the three words "Quick", "Brown" and "Fox" would be transformed into their individual phonetic synonyms during index-time (likely resulting in a list of 20-30 synonyms), and out of those 20-30 synonyms 75% (because there are more than 3) need to match to get a hit.

Now that i realized this, i understand why the minimum_should_match didn't work the way i wanted it to.

BUT, i'm still looking for a way to influence the behaviour that the match of a single phonetic synonym is enough for a match (see above where i explain why "Heine" vs. "Chatten" is a phonetic match without this).

I had an idea, and was thinking that each matched synonym will contribute to the total score, so maybe i could use min_score somehow to define a cutoff score when not enough synonyms were matched. But i didn't find a way to restrict min_score to the phonetic subquery.

So, I would appreciate additional ideas.

best regards
Mario

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.