Elasticsearch ranking shorter/less relevant titles first

I'm working on a product search with Elasticsearch 7.3. The product titles are not formatted the same but there is nothing I can do about this.

Some titles might look like this:

Ford Hub Bearing

And others like this:

Hub bearing for a Chevrolet Z71 - model number 5528923-01

If someone searches for "Chevrolet Hub Bearing" the "Ford Hub Bearing" product ranks #1 and the Chevrolet part ranks #2. If I remove all the extra text (model number 5528923-01) from the product title, the Chevrolet part ranks #1 as desired.

Unfortunately I am unable to fix the product titles, so I need to be able to rank the Chevrolet part as #1 when someone searches Chevrolet Hub Bearing . I have simply set the type of name to text and applied the standard analyzer in my index. Here is my query code:

{
    query:{

        bool: {
            must: [
                {
                    multi_match:{
                        fields: 
                            [
                               'name'
                             ],
                             query: "Chevrolet Hub Bearing"
                    }
                 }                  
            ]
        }

    }         
}

Matches found in shorter length fields end up scoring higher than matches found in longer length fields.

In many cases this makes sense, but sometimes you don't want the field length to contribute to the relevancy score. When you don't want it to contribute, you can disable norms on the field mapping

PUT /my_index
{
  "mappings": {
    "properties": {
      "title": {
        "type": "text",
        "norms": false
      }
    }
  }
}

It's better to disable this in the mapping at index and mapping creation time as above, but if you have an existing index and mapping it can be applied to the existing mapping

PUT my_index/_mapping
{
  "properties": {
    "title": {
      "type": "text",
      "norms": false
    }
  }
}

Just be aware that

  1. norms will not be removed instantly, but will be merged as old segments are merged into new segments as new documents are indexed.
  2. Once disabled, they cannot be re-enabled in the index; you would need to create a new index

I think you should be able to use update_by_query API to help with point 1

POST my_index/_update_by_query?conflicts=proceed

which will cause each existing document to be updated (reindexed)

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.