Custom relevancy

martingrayson · July 3, 2017, 2:21pm

I'm trying to build a process to merge a bunch of records. I have an index of music artists that is not throughly cleansed, I'd like to build a process that loops over each artist and finds similarly spelt ones. The plan is to take these relationships and allow for a user to review them and potentially say "beyonce" and "beyoncé" are the same artist (bad example).

I'm having trouble doing this using the _score value due to inverse term frequency. e.g. If I search for "A midsummer nights dream" on the following documents.
A MIDSUMMER NIGHT'S DREAM
A MIDSUMMER NIGHTS DREAM
A MIDSUMMER NIGHT´S DREAM
A MIDSUMMER'S NIGHT'S DREAM
A NARRATED MIDSUMMER NIGHT'S DREAM

The "NARRATED" version appears higher than some of the other results due to the rarity of "narrated".
My query looks like this:

GET artists/artist/_search
{
  "query": {
    "match": {
      "name": {
        "query": "A MIDSUMMER NIGHTS DREAM",
        "fuzziness": 3
      }
    }
  }
}

I'd like to base the score on perhaps the number of tokens that match the input query, is such a thing possible? I cant find much in the documentation.

system · July 31, 2017, 2:21pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Combing relevancy and trending scores Elasticsearch	7	706	July 6, 2017
Best option for scoring documents based on custom relevancy score Elasticsearch	3	441	July 6, 2017
Query Help Elasticsearch	4	304	July 6, 2017
Rescoring documents based on Author occurrence Elasticsearch	1	273	December 8, 2020
Score based on term existing or not in multiple fields Elasticsearch	1	504	August 24, 2017

Custom relevancy

Related topics