Get top n parents based on sum of score of its children


(Maxime Nay) #1

Hi,

Here is my problem:

We have Authors (parents) and books (children). Authors can be regularly
updated (let's say the number of their fans can vary with the time). Books
are immutable, and each book has a rating.
I want to be able to get the parents that have the highest cumulated rating
score for a particular period of time.
For example I have:

John, who had 100 fans in January and 200 fans in July, wrote:

  • "book 1" with a rating of 12 in January
  • "book 3" with a rating of 19 in March
  • "book 6" with a rating of 9 in June

Jane, who had 50 fans in January and 600 in July, wrote:

  • "book 2" with a rating of 13 in February
  • "book 4" with a rating of 11 in April
  • "book 5" with a rating of 10 in May

In July, I would like to get back, for the time period February - May, the
following results (ordered by cumulated rated):
Jane, 600 fans - 34 cumulated rating
John, 200 fans - 19 cumulated rating

At the same time, I also want to be able to filter based on the content of
the books and also based on characteristics of the authors.

I tried to use a parent/child model, and use a query with has_child and
function_score, but couldn't find a way to do I am looking for.

Any idea?

Thanks!

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7fde7cdb-efc4-4474-b6a1-fe8490f8854b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Maxime Nay) #2

Hi,

I came up with something but I am not sure this is optimal.
Can you let me know what I could improve?

{
"query": {
"bool": {
"must": [
{"has_child" : {
"type" : "books",
"score_mode": "sum",
"query" : {
"function_score": {
"filter": {
"and":[
{"range" : { "publishedAt" : { "gte" :
"2014-07-20T22:04:51.000Z" } } }
]
},
"script_score": { "script": "doc['rating'].value"
}
}
}
}},
{"filtered" : {
"boost": 0,
"query": { "match_all": { }},
"filter": {
"and":[
{"range" : { "fansCount" : { "gt" : 150 } } }
]
}
}}
]
}
}
}

This will return me the cumulated rating as score.

Thanks!

On Tuesday, July 22, 2014 4:15:56 PM UTC-7, Maxime Nay wrote:

Hi,

Here is my problem:

We have Authors (parents) and books (children). Authors can be regularly
updated (let's say the number of their fans can vary with the time). Books
are immutable, and each book has a rating.
I want to be able to get the parents that have the highest cumulated
rating score for a particular period of time.
For example I have:

John, who had 100 fans in January and 200 fans in July, wrote:

  • "book 1" with a rating of 12 in January
  • "book 3" with a rating of 19 in March
  • "book 6" with a rating of 9 in June

Jane, who had 50 fans in January and 600 in July, wrote:

  • "book 2" with a rating of 13 in February
  • "book 4" with a rating of 11 in April
  • "book 5" with a rating of 10 in May

In July, I would like to get back, for the time period February - May, the
following results (ordered by cumulated rated):
Jane, 600 fans - 34 cumulated rating
John, 200 fans - 19 cumulated rating

At the same time, I also want to be able to filter based on the content of
the books and also based on characteristics of the authors.

I tried to use a parent/child model, and use a query with has_child and
function_score, but couldn't find a way to do I am looking for.

Any idea?

Thanks!

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/69c83a6a-4a1f-46a7-b3cb-964f5ab283f7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(system) #3