Hello!
I am trying to find in an index that contains multiple documents related to courses, having a structure like:
[
{
"course_id":"ADE778",
"subscriptions":91,
"source":"page1.com"
},
{
"course_id":"ADE778",
"subscriptions":32,
"source":"page2.com"
},
{
"course_id":"ADE778",
"subscriptions":14,
"source":"page3.com"
},
{
"course_id":"ADE778",
"subscriptions":77,
"source":"page4.com"
},
{
"course_id":"ADE778",
"subscriptions":92,
"source":"page5.com"
},
{
"course_id":"ADE778",
"subscriptions":33,
"source":"page6.com"
},
{
"course_id":"45KFF2",
"subscriptions":12,
"source":"page7.com"
},
{
"course_id":"45KFF2",
"subscriptions":41,
"source":"page8.com"
},
{
"course_id":"45KFF2",
"subscriptions":16,
"source":"page9.com"
},
{
"course_id":"45KFF2",
"subscriptions":18,
"source":"page10.com"
},
{
"course_id":"45KFF2",
"subscriptions":79,
"source":"page11.com"
},
{
"course_id":"45KFF2",
"subscriptions":44,
"source":"page12.com"
}
]
What I was trying to get is getting for each course_id
, the document with the highest subscriptions
value, something like:
[
{
"course_id":"ADE778",
"subscriptions":92,
"source":"page5.com"
},
{
"course_id":"45KFF2",
"subscriptions":79,
"source":"page11.com"
}
]
I was trying with something like this query:
{
"aggs": {
"group_by_course_id": {
"terms": {
"field": "course_id.keyword"
},
"aggs": {
"group_by_course_id": {
"terms": {
"field": "course_id.keyword"
},
"aggs": {
"best_subscriptions": {
"top_hits": {
"size": 1,
"sort": [
{
"subscriptions": {
"order": "desc"
}
}
]
}
}
}
}
}
}
}
}
But I have almost 100K different course_id
values, so I do not know what kind of aggregation should I be using.
Thank you!