Take top n Documents per buckets and do further sub-aggregation


(janek sendrowski) #1

Hi,

I like to take the best n Documents per user which is stored as user_id in
my index. This wouldn't be a problem until now. It could be done like this:

{
"query":{
"match":{
"field":{
"query":"query_string"
}
}
},
"aggs":{
"group_by_user":{
"terms":{
"field":"user_id"
},
"aggs":{
"top_n":{
"top_hits":{
"size":10
}
}
}
}
}
}

But now I like to do a sub-aggregation on it to calculate some expensive
scoring and this isn't possible anymore, because top_hits is a metric
aggregation.

"aggs":{
"max_score_per_user":{
"max":{
"script":"advanced_scoring"
}
}
}
}

My scoring algorithm is very expensive, so I can't apply it on the full
document set per user which is returned by the query

I also can't use the rescore feature which provides a window parameter,
because I first have to bucket the documents per user and then take the
best n docs per user.

The range query would work, but the scoring aren't comparable because of
the IDF. So I can't define a fixed range.

So I either have to make the scoring results comparable, which would be
simple, but the constant_score query doesn't work with the match query
which I am using or I have to find a way to reduce the bucket size to a
certain limit while ordering by relevance.

I'm trying since days to find a way to do that, but it seems that it's not
possible.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/197f68ab-8de4-445a-a7b8-d9b865370540%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(system) #2