I have the following query which returns data from a database of articles published about different video games.
GET articles/_search
{
"query": {
"bool": {
"must": [
{
"term": {
"mediatype_id": 2
}
},
{
"range": {
"inserted_at": {
"gte": "2022-01-01",
"lte": "2022-01-31"
}
}
}
]
}
},
"sort": [
{
"_score": {
"order": "desc"
}
}
]
}
Here is an example response:
{
"_index" : "articles",
"_type" : "_doc",
"_id" : "1507886006",
"_score" : 2.0,
"_source" : {
"author" : "Shubhankar Parijat",
"description" : "It's been rumoured that Microsoft could be working on a remastered collection for another one of its franchises, similar to Halo: The Master Chief Collection.",
"feed_id" : 1919,
"id" : 1507886006,
"inserted_at" : "2022-01-25T08:14:27.756538Z",
"language" : "en",
"mediatype_id" : 2,
"pub_date" : "2022-01-25T08:10:45Z",
"score" : 4.0,
"source" : 0,
"title" : "Another Xbox Franchise is “Going to Get the Master Chief Collection Treatment” – Rumour",
"url" : "https://gamingbolt.com/another-xbox-franchise-is-going-to-get-the-master-chief-collection-treatment-rumour"
}
},
If I wanted to find the top mentioned games in a large set of these articles, how could I aggregate that data, and group it in order of games with the most mentions in the set of articles between two dates to the least?
For example, if “Final Fantasy XVI” had 10k articles and “NieR: Automata” had 5k, Final Fantasy has to be first on the list of results.
As you can see there is no 'game' field in the data returned. Maybe the games mentioned in these articles will have to be aggregated somehow from the url/title/description? I am not sure how to do this. For the example above, this article mentions 'Master Chief Collection"
Thank you.