I'm very new to elasticsearch and would like some guidance on the best approach to handle my problem.
I have a million articles that I want to search. Each article is ranked beforehand for each user using a custom function. That means every article has as many scores as there are users. When searching I want to rank the returned articles according to the custom scores of one particular user. What's the best way to store a (possibly) unlimited amount of scores for each article?
How do I implement this in a way that is scalable and supported by elasticsearch?
All ideas I came up with don't seem to fit the way elasticsearch is meant to be used.
-
IDEA 1: Store an array of scores on each article. Each index represents a score for one user. When getting results for user-id 42 I simply rank by the 42nd element in my arrays.
PROBLEM: Arrays don't work as expected. They have no order. According to this I would need to read that value from disk every time for every article (slow), or use thenested
datatype which has a maximum nesting level (hence doesn't scale). -
IDEA 2: Create separate fields for each score on an article. E.g. "score_user_1", "score_user_2", ... This way I can dynamically compose my field from the user-id and then simply sort the results by that field.
PROBLEM: I don't think I should create an unlimited amount of fields per document. By default the max number of fields is set to 1000. What if I someday have 100,000 users? Would this still work? Up until which point? It feels to me like this is not how elasticsearch works. -
IDEA 3: Create a parent-child relation between article and score. When searching I could join all articles with their childs (i.e. scores) that belong to the given user-id. Then articles are sorted based on the value stored in their child.
PROBLEM: The documentation says that joining is very slow and should be avoided at all costs. If one really must use it, it should be used as little as possible. I would be using it in the millions.