Modeling parent/child relationship boosting child search with parent attribute

Hello

I'd like to find out what your thoughts are on the following use case:

  • We have a service that indexes thousands o products from each of our individual client's
  • Those clients buy monthly an amount of credits
  • Each time an user clicks a product of a specifc client their credits get subtracted of 1 unit
  • Our goal is to use this "credits" attribute as a feature (rank feature?) that incluences relevance on each query

So, how would you model this?

  1. An index "products" in which each document represents a product. The document would have a "credits" field that represents the amount of credit of the client for that month.

Don't think this would be scalable... For every click we would have to update every document of that client (thousands).

  1. A nested/join relationship between a parent "client" index an its relative "products".

With this I'd keep the "credits" value only in the parent document (client).

How good would the performance of this be?

  1. Like (1), but without updating all the documents for every click, but like once a day instead.

Do you have any other approach you could suggest?

Thank you

Indeed, rank_feature is designed to influence relevance based on some numeric features.

And indeed choosing a right design is challenging.

  1. You are right, updating a each document each time there was a click is not scalable. This will not work.
  2. Nested field type has the same problem as option1; each doc with nested fields corresponds to a single document in elasticsearch; updating any part of it leads to updating the whole document.
    join field type is different as parent and child documents are separate documents in elasticsearch, and updating one of them doesn't influence the other. This could be a viable solution, but you should measure the performance, as joint queries as slower.

And then if you have for example, the following mapping:

"mappings": {
    "properties": {
      "credits": {
        "type": "rank_feature"
      },
      "my_join_field": { 
        "type": "join",
        "relations": {
          "clients": "products" 
        }
      }
    }
  }

Your query than can look like this:

{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "product_name": "bike"
          }
        }
      ],
      "should": {
        "has_parent": {
          "score": "true",
          "parent_type": "clients",
          "query": {
            "rank_feature": {
              "field": "credits"
            }
          }
        }
      }
    }
  }
}

where "bike" from clients with highest credits will be scored higher.

  1. Periodically updating could also be viable, if for example this is a job done at night when search queries are rare.

I guess you would need to measure the performance of 2nd and 3rd options on your data and decide what is better.

1 Like

Thank you so much @mayya for taking the time and offering a very didactic answer!