I am using Elasticsearch and I want to group our results by a specific field, returning only the most recent document per group. When scoring and sorting, I want the documents I am not returning (the ones that are older) to be ignored.
I have tried approaching this with collapse, however the "hidden" documents are also taken into account, which I would like to avoid.
Example
In the following example I have 2 groups of documents, which I would like to group by their email
, taking for each group the most recent by created_at
, and sort them by their rating
descending.
With the data of the example, the most recent ones are Aaa 1
(with email aaa@aaa.com
) and Bbb 4
(with email bbb@bbb.com
). I want to sort by their rating descending, I am expecting Bbb 4
and then Aaa 1
. However, they are returned the other way around, because the Aaa 2
and Aaa 3
are also scored, which I want to avoid.
How can I write my query in a way that would return Bbb 4
and then Aaa 1
? Should I be using the top_hits
aggregation instead?
PUT test
{
"mappings": {
"properties": {
"name": {
"type": "keyword"
},
"email": {
"type": "keyword"
},
"description": {
"type": "text"
},
"rating": {
"type": "integer"
},
"created_at": {
"type": "date"
}
}
}
}
POST test/_doc
{
"name": "Aaa 1",
"rating": 1,
"created_at": "2021-01-01",
"description": "A quick fox",
"email": "aaa@aaa.com"
}
POST test/_doc
{
"name": "Aaa 2",
"rating": 20,
"created_at": "2020-01-01",
"description": "jumps over",
"email": "aaa@aaa.com"
}
POST test/_doc
{
"name": "Aaa 3",
"rating": 30,
"created_at": "2019-01-01",
"description": "the fence",
"email": "aaa@aaa.com"
}
POST test/_doc
{
"name": "Bbb 4",
"rating": 4,
"created_at": "2021-01-02",
"description": "behind the house",
"email": "bbb@bbb.com"
}
POST test/_doc
{
"name": "Bbb 5",
"rating": 5,
"created_at": "2020-01-02",
"description": "we live in",
"email": "bbb@bbb.com"
}
GET test/_search
{
"_source": false,
"track_total_hits": false,
"query": {
"bool": {
"should": {
"match_all": {}
}
}
},
"collapse": {
"field": "email",
"inner_hits": [
{
"name": "last_document",
"size": 1,
"_source": ["name","email","rating"],
"sort": [
{
"created_at": {
"order": "desc"
}
}
]
}
]
},
"sort": [
{
"rating": {
"order": "desc"
}
}
]
}
This returns
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"max_score" : null,
"hits" : [
{
"_index" : "test",
"_type" : "_doc",
"_id" : "bccEn3oBRQ1dOOnBe3nD",
"_score" : null,
"fields" : {
"email" : [
"aaa@aaa.com"
]
},
"sort" : [
30
],
"inner_hits" : {
"last_document" : {
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : null,
"hits" : [
{
"_index" : "test",
"_type" : "_doc",
"_id" : "a8cEn3oBRQ1dOOnBdXli",
"_score" : null,
"_source" : {
"name" : "Aaa 1",
"rating" : 1,
"email" : "aaa@aaa.com"
},
"sort" : [
1609459200000
]
}
]
}
}
}
},
{
"_index" : "test",
"_type" : "_doc",
"_id" : "b8cEn3oBRQ1dOOnBiHkx",
"_score" : null,
"fields" : {
"email" : [
"bbb@bbb.com"
]
},
"sort" : [
5
],
"inner_hits" : {
"last_document" : {
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : null,
"hits" : [
{
"_index" : "test",
"_type" : "_doc",
"_id" : "bscEn3oBRQ1dOOnBgHlt",
"_score" : null,
"_source" : {
"name" : "Bbb 4",
"rating" : 4,
"email" : "bbb@bbb.com"
},
"sort" : [
1609545600000
]
}
]
}
}
}
}
]
}
}