Custom relevancy

I'm trying to build a process to merge a bunch of records. I have an index of music artists that is not throughly cleansed, I'd like to build a process that loops over each artist and finds similarly spelt ones. The plan is to take these relationships and allow for a user to review them and potentially say "beyonce" and "beyoncé" are the same artist (bad example).

I'm having trouble doing this using the _score value due to inverse term frequency. e.g. If I search for "A midsummer nights dream" on the following documents.
A MIDSUMMER NIGHT'S DREAM
A MIDSUMMER NIGHTS DREAM
A MIDSUMMER NIGHT´S DREAM
A MIDSUMMER'S NIGHT'S DREAM
A NARRATED MIDSUMMER NIGHT'S DREAM

The "NARRATED" version appears higher than some of the other results due to the rarity of "narrated".
My query looks like this:

GET artists/artist/_search
{
  "query": {
    "match": {
      "name": {
        "query": "A MIDSUMMER NIGHTS DREAM",
        "fuzziness": 3
      }
    }
  }
}

I'd like to base the score on perhaps the number of tokens that match the input query, is such a thing possible? I cant find much in the documentation.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.