Retrieve 6 products for top 3 users and each one has 2 with highest matching score


(Yao) #1

I have a collection of products which belong to few users, like

[
{ id: 1, user_id: 1, description: "blabla...", ... },
{ id: 2, user_id: 2, description: "blabla...", ... },
{ id: 3, user_id: 2, description: "blabla...", ... },
{ id: 4, user_id: 3, description: "blabla...", ... },
{ id: 5, user_id: 4, description: "blabla...", ... },
{ id: 6, user_id: 2, description: "blabla...", ... },
{ id: 7, user_id: 3, description: "blabla...", ... },
{ id: 8, user_id: 4, description: "blabla...", ... },
{ id: 9, user_id: 2, description: "blabla...", ... },
{ id: 10, user_id: 3, description: "blabla...", ... },
{ id: 11, user_id: 4, description: "blabla...", ... },
...
]

(the real data has more fields, but most important ones like 1st for
product id, 2nd for user id, 3rd for product description.)

I'd like to retrieve 2 products for top 3 users whose products have highest
matching score (matching condition is description includes "fashion" and
some other keywords, in this case just use "fashion" as example) :

[
{ id: 2, user_id: '2', description: "blabla...", ..., _score: 100},
{ id: 3, user_id: '2', description: "blabla...", ..., _score: 95},
{ id: 4, user_id: '3', description: "blabla...", ..., _score: 90},
{ id: 5, user_id: '4', description: "blabla...", ..., _score: 80},
{ id: 7, user_id: '3', description: "blabla...", ..., _score: 70},
{ id: 8, user_id: '4', description: "blabla...", ..., _score: 65},
...
]

I have 3 possible ways to try:

  1. use term facet to get unique user_id in nested query, then use them for
    the user id range of outside query which focus on match description with
    keywords like "fashion".

I don't know how to implement it in ES (stuck in facet terms iteration and
construct user_id range with subquery with facet), try in sql like:

select id, user_id, description
from product
where user_id in (
select distinct user_id
from product
limit 3)
order by _score
limit 6
/* 6 = 2 * 3 */

But it cannot guarantee top 6 products coming from 3 different user.

Also, according to the following two links, it seems facet terms specific
information iteration feature has not been implemented in ES so far.
http://elasticsearch-users.115913.n3.nabble.com/Terms-stats-facet-Additional-information-td4035199.html

  1. query with term filed in description matched with keywords like
    "fashion", at same time do statistics for each user_id with aggregation and
    limit the count to 2, then pick top 6 products with highest matching score.

I still don't know how to implement in ES.

  1. use brute force with multiple queries until find top 3 users, each one
    has 2 products with highest matching scores.

I mean use a hash map, key is user_id, value is how many times it appears.
Query with matching keywords first, then iterate immediate results and
check hash map, if value is less than 2, add to final result product list,
otherwise skip it.

Please let me know if you can figure it out in the above 1st or 2nd way.

Appreciate in advance.
Yao

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/723e0e59-e587-42b5-9fa4-390a27f2e7a6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Yao) #2

What about nested or parent/child query? How to achieve?

On Thursday, May 8, 2014 4:45:36 PM UTC-7, Yao Li wrote:

I have a collection of products which belong to few users, like

[
{ id: 1, user_id: 1, description: "blabla...", ... },
{ id: 2, user_id: 2, description: "blabla...", ... },
{ id: 3, user_id: 2, description: "blabla...", ... },
{ id: 4, user_id: 3, description: "blabla...", ... },
{ id: 5, user_id: 4, description: "blabla...", ... },
{ id: 6, user_id: 2, description: "blabla...", ... },
{ id: 7, user_id: 3, description: "blabla...", ... },
{ id: 8, user_id: 4, description: "blabla...", ... },
{ id: 9, user_id: 2, description: "blabla...", ... },
{ id: 10, user_id: 3, description: "blabla...", ... },
{ id: 11, user_id: 4, description: "blabla...", ... },
...
]

(the real data has more fields, but most important ones like 1st for
product id, 2nd for user id, 3rd for product description.)

I'd like to retrieve 2 products for top 3 users whose products have
highest matching score (matching condition is description includes
"fashion" and some other keywords, in this case just use "fashion" as
example) :

[
{ id: 2, user_id: '2', description: "blabla...", ..., _score: 100},
{ id: 3, user_id: '2', description: "blabla...", ..., _score: 95},
{ id: 4, user_id: '3', description: "blabla...", ..., _score: 90},
{ id: 5, user_id: '4', description: "blabla...", ..., _score: 80},
{ id: 7, user_id: '3', description: "blabla...", ..., _score: 70},
{ id: 8, user_id: '4', description: "blabla...", ..., _score: 65},
...
]

I have 3 possible ways to try:

  1. use term facet to get unique user_id in nested query, then use them for
    the user id range of outside query which focus on match description with
    keywords like "fashion".

I don't know how to implement it in ES (stuck in facet terms iteration and
construct user_id range with subquery with facet), try in sql like:

select id, user_id, description
from product
where user_id in (
select distinct user_id
from product
limit 3)
order by _score
limit 6
/* 6 = 2 * 3 */

But it cannot guarantee top 6 products coming from 3 different user.

Also, according to the following two links, it seems facet terms specific
information iteration feature has not been implemented in ES so far.

http://elasticsearch-users.115913.n3.nabble.com/Terms-stats-facet-Additional-information-td4035199.html

https://github.com/elasticsearch/elasticsearch/issues/256

  1. query with term filed in description matched with keywords like
    "fashion", at same time do statistics for each user_id with aggregation and
    limit the count to 2, then pick top 6 products with highest matching score.

I still don't know how to implement in ES.

  1. use brute force with multiple queries until find top 3 users, each one
    has 2 products with highest matching scores.

I mean use a hash map, key is user_id, value is how many times it appears.
Query with matching keywords first, then iterate immediate results and
check hash map, if value is less than 2, add to final result product list,
otherwise skip it.

Please let me know if you can figure it out in the above 1st or 2nd way.

Appreciate in advance.
Yao

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/8273ae86-1344-4b59-8680-2a82eee98de5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(ElasticSearch Users mailing list) #3

Not sure if this would help, but you can first to a terms facet/aggregation
on user_id, then you pull back the top 3 ids, say user5, user7, and user20.

Then you run a second query using the _msearch API wherein you construct
three independent search queries (one for each user) and you will get back
3 independent search results each with its own ranking/scoring.

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-multi-search.html#search-multi-search

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/e18c5764-2218-489c-bfdf-1b835f852e7e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Yao) #4

In multi search, you mean use 5, 7, 20 (user5, user7, and user20) to indicate user id for routing and then pick top 2 products for each of them?

I use Play Framework and Scala, do you know how to embed the facet term results (user id) into the multi search? (As far as I know in ES Java APIs, it can grab user ids, but the feature to retrieve facet terms has not been implemented, I mean only product count for each id, if can iterate each products and then the question is much easy)

http://www.elasticsearch.org/guide/en/elasticsearch/client/java-api/current/java-facets.html

Thanks a lot!


(system) #5