Order of suggestion results for same score with fuzzy match


#1

I am seeing some unexpected behavior for a suggestion query with fuzzy turned on. The results include a number of documents with the same score (including the exact match) but the exact match is not the first one.

Using elasticsearchjs 11 with ES 2.3.5 cluster

The query is as follows:

client.suggest({
  index: 'myindex',
  body: {
    "suggestions": {
      "text": "Politics",
      "completion": {
        "field": "suggestTag",
        "size": 10,
        "fuzzy": true,
        "context": {
          "community": "Test"
        }
      }
    }
  }
});

The result is:

{
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "suggestions": [
    {
      "text": "Politics",
      "offset": 0,
      "length": 7,
      "options": [
        {
          "text": "Politica",
          "score": 1000,
          "payload": {
            "_id": "Politica"
          }
        },
        {
          "text": "PoliticalTheory",
          "score": 1000,
          "payload": {
            "_id": "PoliticalTheory"
          }
        },
        {
          "text": "Politics",
          "score": 1000,
          "payload": {
            "_id": "Politics"
          }
        },
        {
          "text": "Poltics",
          "score": 1000,
          "payload": {
            "_id": "Poltics"
          }
        },
        {
          "text": "Middle East Politics",
          "score": 2,
          "payload": {
            "_id": "MiddleEastPolitics"
          }
        },
        {
          "text": "Political Campaigns",
          "score": 1,
          "payload": {
            "_id": "PoliticalCampaigns"
          }
        },
        {
          "text": "Political Science",
          "score": 1,
          "payload": {
            "_id": "PoliticalScience"
          }
        }
      ]
    }
  ]
}

As you can see, the exact match for "Politics" has the same score (1000) as some of the other matches. I would expect that the exact match is the first result in that case.

The relevant piece of the mapping is:

"suggestTag": {
  "type": "completion",
  "analyzer": "simple",
  "preserve_separators": false,
  "payloads": true,
  "context": {
    "community": {
      "type": "category",
      "default": [
        "*"
      ]
    }
  }
}  

Is there a way to change the sort behavior/scoring in that case? I have not seen anything in this forum that deals with the suggestion query in particular.


(Michael McCandless) #2

Alas, this is a limitation of the underlying Lucene suggester, I believe. The sources (FuzzySuggester.java) have this TODO:

    // TODO: right now there's no penalty for fuzzy/edits,
    // ie a completion whose prefix matched exactly what the
    // user typed gets no boost over completions that
    // required an edit, which get no boost over completions
    // requiring two edits.  I suspect a multiplicative
    // factor is appropriate (eg, say a fuzzy match must be at
    // least 2X better weight than the non-fuzzy match to
    // "compete") ... in which case I think the wFST needs
    // to be log weights or something ...

Patches welcome!

Mike McCandless


(system) #3