Retrieve 6 products for top 3 users and each one has 2 with highest matching score


(Yao) #1

I have a collection of products which belong to few users, like

[
{ id: 1, user_id: 1, description: "blabla...", ... },
{ id: 2, user_id: 2, description: "blabla...", ... },
{ id: 3, user_id: 2, description: "blabla...", ... },
{ id: 4, user_id: 3, description: "blabla...", ... },
{ id: 5, user_id: 4, description: "blabla...", ... },
{ id: 6, user_id: 2, description: "blabla...", ... },
{ id: 7, user_id: 3, description: "blabla...", ... },
{ id: 8, user_id: 4, description: "blabla...", ... },
{ id: 9, user_id: 2, description: "blabla...", ... },
{ id: 10, user_id: 3, description: "blabla...", ... },
{ id: 11, user_id: 4, description: "blabla...", ... },
...
]

(the real data has more fields, but most important ones like 1st for product id, 2nd for user id, 3rd for product description.)

I'd like to retrieve 2 products for top 3 users whose products have highest matching score (matching condition is description includes "fashion" and some other keywords, in this case just use "fashion" as example) :

[
{ id: 2, user_id: '2', description: "blabla...", ..., _score: 100},
{ id: 3, user_id: '2', description: "blabla...", ..., _score: 95},
{ id: 4, user_id: '3', description: "blabla...", ..., _score: 90},
{ id: 5, user_id: '4', description: "blabla...", ..., _score: 80},
{ id: 7, user_id: '3', description: "blabla...", ..., _score: 70},
{ id: 8, user_id: '4', description: "blabla...", ..., _score: 65},
...
]

I have 3 possible ways to try:

  1. use term facet to get unique user_id in nested query, then use them for the user id range of outside query which focus on match description with keywords like "fashion".

I don't know how to implement it in ES (stuck in facet terms iteration and construct user_id range with subquery with facet), try in sql like:

select id, user_id, description
from product
where user_id in (
select distinct user_id
from product
limit 3)
order by _score
limit 6
/* 6 = 2 * 3 */

But it cannot guarantee top 6 products coming from 3 different user.

Also, according to the following two links, it seems facet terms specific information iteration feature has not been implemented in ES so far.
http://elasticsearch-users.115913.n3.nabble.com/Terms-stats-facet-Additional-information-td4035199.html

https://github.com/elasticsearch/elasticsearch/issues/256

What about nested or parent/child query? How to achieve?

  1. query with term filed in description matched with keywords like "fashion", at same time do statistics for each user_id with aggregation and limit the count to 2, then pick top 6 products with highest matching score.

I still don't know how to implement in ES.

  1. use brute force with multiple queries until find top 3 users, each one has 2 products with highest matching scores.

I mean use a hash map, key is user_id, value is how many times it appears. Query with matching keywords first, then iterate immediate results and check hash map, if value is less than 2, add to final result product list, otherwise skip it.

Please let me know if you can figure it out in the above 1st or 2nd way.

Appreciate in advance.
Yao


(system) #2