Modeling parent/child relationship boosting child search with parent attribute

ffknob · October 2, 2020, 5:09pm

Hello

I'd like to find out what your thoughts are on the following use case:

We have a service that indexes thousands o products from each of our individual client's
Those clients buy monthly an amount of credits
Each time an user clicks a product of a specifc client their credits get subtracted of 1 unit
Our goal is to use this "credits" attribute as a feature (rank feature?) that incluences relevance on each query

So, how would you model this?

An index "products" in which each document represents a product. The document would have a "credits" field that represents the amount of credit of the client for that month.

Don't think this would be scalable... For every click we would have to update every document of that client (thousands).

A nested/join relationship between a parent "client" index an its relative "products".

With this I'd keep the "credits" value only in the parent document (client).

How good would the performance of this be?

Like (1), but without updating all the documents for every click, but like once a day instead.

Do you have any other approach you could suggest?

Thank you

mayya · October 2, 2020, 10:22pm

Indeed, rank_feature is designed to influence relevance based on some numeric features.

And indeed choosing a right design is challenging.

You are right, updating a each document each time there was a click is not scalable. This will not work.
Nested field type has the same problem as option1; each doc with nested fields corresponds to a single document in elasticsearch; updating any part of it leads to updating the whole document.
join field type is different as parent and child documents are separate documents in elasticsearch, and updating one of them doesn't influence the other. This could be a viable solution, but you should measure the performance, as joint queries as slower.

And then if you have for example, the following mapping:

"mappings": {
    "properties": {
      "credits": {
        "type": "rank_feature"
      },
      "my_join_field": { 
        "type": "join",
        "relations": {
          "clients": "products" 
        }
      }
    }
  }

Your query than can look like this:

{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "product_name": "bike"
          }
        }
      ],
      "should": {
        "has_parent": {
          "score": "true",
          "parent_type": "clients",
          "query": {
            "rank_feature": {
              "field": "credits"
            }
          }
        }
      }
    }
  }
}

where "bike" from clients with highest credits will be scored higher.

Periodically updating could also be viable, if for example this is a job done at night when search queries are rare.

I guess you would need to measure the performance of 2nd and 3rd options on your data and decide what is better.

ffknob · October 3, 2020, 12:46pm

Thank you so much @mayya for taking the time and offering a very didactic answer!

system · October 31, 2020, 12:47pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Modeling of data with parent-child relationship and fast query Elasticsearch	1	429	July 5, 2018
Parent/child document join Elasticsearch	8	3229	July 5, 2017
Join on ElasticSearch Elasticsearch	11	2205	July 5, 2017
Parent/Child searching/faceting/sorting Elasticsearch	1	337	July 6, 2017
Data modeling: nested vs parent/child vs two indices Elasticsearch	2	1193	July 5, 2017

Modeling parent/child relationship boosting child search with parent attribute

Related topics