Elasticsearch ranking shorter/less relevant titles first

Cannon_Moyer · August 29, 2019, 7:30pm

I'm working on a product search with Elasticsearch 7.3. The product titles are not formatted the same but there is nothing I can do about this.

Some titles might look like this:

Ford Hub Bearing

And others like this:

Hub bearing for a Chevrolet Z71 - model number 5528923-01

If someone searches for "Chevrolet Hub Bearing" the "Ford Hub Bearing" product ranks #1 and the Chevrolet part ranks #2. If I remove all the extra text (model number 5528923-01) from the product title, the Chevrolet part ranks #1 as desired.

Unfortunately I am unable to fix the product titles, so I need to be able to rank the Chevrolet part as #1 when someone searches Chevrolet Hub Bearing . I have simply set the type of name to text and applied the standard analyzer in my index. Here is my query code:

{
    query:{

        bool: {
            must: [
                {
                    multi_match:{
                        fields: 
                            [
                               'name'
                             ],
                             query: "Chevrolet Hub Bearing"
                    }
                 }                  
            ]
        }

    }         
}

forloop · August 30, 2019, 7:29am

Matches found in shorter length fields end up scoring higher than matches found in longer length fields.

In many cases this makes sense, but sometimes you don't want the field length to contribute to the relevancy score. When you don't want it to contribute, you can disable norms on the field mapping

PUT /my_index
{
  "mappings": {
    "properties": {
      "title": {
        "type": "text",
        "norms": false
      }
    }
  }
}

It's better to disable this in the mapping at index and mapping creation time as above, but if you have an existing index and mapping it can be applied to the existing mapping

PUT my_index/_mapping
{
  "properties": {
    "title": {
      "type": "text",
      "norms": false
    }
  }
}

Just be aware that

norms will not be removed instantly, but will be merged as old segments are merged into new segments as new documents are indexed.
Once disabled, they cannot be re-enabled in the index; you would need to create a new index

I think you should be able to use update_by_query API to help with point 1

POST my_index/_update_by_query?conflicts=proceed

which will cause each existing document to be updated (reindexed)

system · September 27, 2019, 7:29am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Elasticsearch relevance Elasticsearch	4	407	July 6, 2017
Newbie quesiton re: document size & score Elasticsearch	3	357	July 6, 2017
Relevancy sorting of result returned Elasticsearch	20	676	July 6, 2017
"more_like_this query result": More relevant document got lesser relevance score than the lesser relevant document Elasticsearch	1	367	February 15, 2017
Give higher relevancy (sort) to the title which is shorter Elasticsearch	2	1028	April 5, 2018

Elasticsearch ranking shorter/less relevant titles first

Related topics