Dec 8th, 2020 [EN] Rank features for e-commerce search

Русская версия

Introduction

Modern e-commerce search is expected to be fast, relevant and provide an opportunity for promoting certain results. This article demonstrates how rank_feature and rank_features field types of elasticsearch can help in this goal. We will use an example of a commercial search engine for a shoes shop.

Problem 1: enhancing relevance

Problem 1: We want to find a good way to rank matching results for a user query. It is common to incorporate various popularity metrics into the ranking of search results. These popularity metrics relate to the potential relevance of the document regardless of the query (Google’s PageRank is a famous example of such a metric). In our shop example, these popularity metrics can be the number of times a particular shoe was viewed, bought, or rating it received from users.

A solution: This could be achieved with modelling our shoes' popularity metrics with rank_feature fields, like this:

PUT shoes
{
  "mappings": {
    "properties": {
      "product_name" : {
       "type" : "text"
      },
      "views_count" : {
        "type" : "rank_feature"
      },
      "ordered_count" : {
        "type" : "rank_feature"
      },
      "rating" : {
        "type" : "rank_feature"
      }
    }
  }
}

Indexing a sample of documents:

POST shoes/_bulk
{ "index" : { "_id" : "1"} }
{"product_name" : "Nike Air Zoom Structure", "views_count" : 900, "ordered_count": 14, "rating" : 4.9 }
{ "index" : { "_id" : "2"} }
{"product_name" : "Nike Air Max", "views_count" : 1780, "ordered_count": 17, "rating" : 4.7} 
{ "index" : { "_id" : "3"} }
{"product_name" : "Adidas ULTRABOOST 20", "views_count" : 2560, "ordered_count": 23, "rating" : 4.9}

Now we can enhance our text based searches with popularity metrics by combining a match query with a special rank_feature query. For example, a query below first finds and ranks matching products for a user query, and then adds extra scores based on the popularity of the product. Thus, the more popular the product is in terms of number of views, orders and rating, the more its score will be increased, bringing it higher in the total ranking. The query can be further personalized by providing a separate boost parameter for each individual rank_feature query.

GET shoes/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "product_name": <user_query>
          }
        }
      ],
      "should": [
       {
         "rank_feature" : {
           "field" : "views_count"
         }
       },
       {
         "rank_feature" : {
           "field" : "ordered_count"
         }
       },
       {
         "rank_feature" : {
           "field" : "rating"
         }
       } 
      ]
    }
  }
}

Problem 2: Curating search results

Problem 2: We want to promote certain products when users enter specific keywords. These are called pinned, advertised or promoted search results.

A solution: This could be achieved by assigning for each product a set of keywords or categories that characterize this product. For each category we assign a numeric weight that expresses the degree to which we want this product to be boosted when this category is searched. Categories can be very diverse and sparse, i.e. there can be a lot of categories and each category can relate to a relatively small number of products. In elasticsearch we can model these categories with rank_features field type:

PUT shoes
{
  "mappings": {
    "properties": {
      "product_name": {
        "type": "text"
      },
      "categories": {
        "type": "rank_features"
      }
    }
  }
}

Below is an example of an indexing request that promotes "Nike" shoes in the "sneakers" category by assigning higher values for Nike shoes and lower values for other shoes in the "categories.sneakers".

POST shoes/_bulk
{ "index" : { "_id" : "1"} }
{"product_name" : "Nike Air Zoom Structure", "categories" : {"sneakers" : 10, "running" : 10, "athleisure" : 2}  }
{ "index" : { "_id" : "2"} }
{"product_name" : "Nike Air Max", "categories" : {"sneakers" : 10, "athleisure" : 10} } 
{ "index" : { "_id" : "3"} }
{"product_name" : "Adidas ULTRABOOST 20", "categories" : {"sneakers" : 8, "running" : 10, "athleisure" : 3} }

Thus, if we execute a rank_feature query with "categories.sneakers", "Nike" shoes will be top ranked.

GET shoes/_search
{
  "query": {
    "rank_feature" : {
      "field": "categories.sneakers"
    }
  }
}

Results will be ranked differently for the "athleisure" category, which will be based on the numeric values that we assigned to shoes for this category.

GET shoes/_search
{
  "query": {
    "rank_feature" : {
      "field": "categories.athleisure"
    }
  }
}

Technical Details

Elasticsearch encodes feature values of rank_feature(s) fields as term frequencies, and rank features corresponding to the same field will be stored in a single field. For example, if "categories" field has the following mapping:

"categories" : {
    "type" : "rank_features"
}

An indexing request:

{ "index" : { "_id" : "1"} }
{"product_name" : "Nike Air Zoom Structure", "categories" : {"sneakers" : 10, "running" : 10, "athleisure" : 2}  }

creates a document with a field "categories" that has 3 terms:

  1. "sneakers" with term frequency of 10
  2. "running" with term frequency of 10
  3. "athleisure" with term frequency of 2

Elasticsearch storing all rank features in a single field allows us to have a large number of them without running into mapping explosion, which will happen if we allocate a separate field for each individual category.

As an added bonus rank_feature query is very fast, because it can efficiently skip non-competitive documents that have lower values for term frequencies.

Conclusion

This article has demonstrated how rank_feature(s) fields can be useful for tuning relevance on the example of an e-commerce search. In conclusion we want to note that ranking is a complex and evolving topic. The goal of this article is not to provide any prescriptive methods, but rather to show examples of how rank_feature(s) fields can help in ranking.

References

3 Likes

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.