I'd like to find out what your thoughts are on the following use case:
We have a service that indexes thousands o products from each of our individual client's
Those clients buy monthly an amount of credits
Each time an user clicks a product of a specifc client their credits get subtracted of 1 unit
Our goal is to use this "credits" attribute as a feature (rank feature?) that incluences relevance on each query
So, how would you model this?
An index "products" in which each document represents a product. The document would have a "credits" field that represents the amount of credit of the client for that month.
Don't think this would be scalable... For every click we would have to update every document of that client (thousands).
A nested/join relationship between a parent "client" index an its relative "products".
With this I'd keep the "credits" value only in the parent document (client).
How good would the performance of this be?
Like (1), but without updating all the documents for every click, but like once a day instead.
Indeed, rank_feature is designed to influence relevance based on some numeric features.
And indeed choosing a right design is challenging.
You are right, updating a each document each time there was a click is not scalable. This will not work.
Nested field type has the same problem as option1; each doc with nested fields corresponds to a single document in elasticsearch; updating any part of it leads to updating the whole document. join field type is different as parent and child documents are separate documents in elasticsearch, and updating one of them doesn't influence the other. This could be a viable solution, but you should measure the performance, as joint queries as slower.
And then if you have for example, the following mapping:
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.