Hi, I am researching a way to filter data after grouping. For example, here is the data:
index: domain_user_search
doc:
[
{
"domain_key": "PElvL8li",
"delete_yn": "Y",
"@timestamp": "2024-12-03T05:49:51.000Z"
},
{
"domain_key": "3zit1iG4",
"delete_yn": "N",
"@timestamp": "2024-12-03T05:47:56.000Z"
},
{
"domain_key": "MPfyjQrP",
"delete_yn": "N",
"@timestamp": "2024-12-03T05:47:56.000Z"
},
{
"domain_key": "vBYwrqvO",
"delete_yn": "N",
"@timestamp": "2024-12-03T05:46:56.000Z"
},
{
"domain_key": "PElvL8li",
"delete_yn": "N",
"@timestamp": "2024-12-03T05:45:51.000Z"
},
]
Data like this exists,
GET /domain_user_mapping/_search
{
"size": 0,
"aggs": {
"group_by_domain_key": {
"terms": {
"field": "domain_key.keyword",
"size": 3
},
"aggs": {
"latest_doc": {
"filter": {
"match_all": {}
},
"aggs": {
"sorted_docs": {
"terms": {
"field": "@timestamp",
"size": 1,
"order": {
"_key": "desc"
}
},
"aggs": {
"filtered_doc_content": {
"filter": {
"term": {
"delete_yn.keyword": "N"
}
}
}
}
}
}
}
}
}
}
}
If you search with a query like this:
"aggregations": {
"group_by_domain_key": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 3,
"buckets": [
{
"key": "PElvL8li",
"doc_count": 2,
"latest_doc": {
"doc_count": 2,
"sorted_docs": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 1,
"buckets": [
{
"key": 1733204991000,
"key_as_string": "2024-12-03T05:49:51.000Z",
"doc_count": 1,
"filtered_doc_content": {
"doc_count": 0
}
}
]
}
}
},
{
"key": "3zit1iG4",
"doc_count": 1,
"latest_doc": {
"doc_count": 1,
"sorted_docs": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": 1733204876000,
"key_as_string": "2024-12-03T05:47:56.000Z",
"doc_count": 1,
"filtered_doc_content": {
"doc_count": 1
}
}
]
}
}
},
{
"key": "MPfyjQrP",
"doc_count": 1,
"latest_doc": {
"doc_count": 1,
"sorted_docs": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": 1733204876000,
"key_as_string": "2024-12-03T05:47:56.000Z",
"doc_count": 1,
"filtered_doc_content": {
"doc_count": 1
}
}
]
}
}
}
]
}
As the key is PElvL8li, the data whose delete_yn is Y is the most recent, so the data is included and "PElvL8li","3zit1iG4","MPfyjQrP" is displayed.
What I want is that since delete_yn is Y, it is excluded from the response data and I want to receive 3 data, "3zit1iG4","MPfyjQrP","vBYwrqvO".
The prerequisite is that the data whose delete_yn is Y should be removed in advance and grouped based on the latest time, and the data whose delete_yn is Y should be excluded after grouping based on the latest time.
Is this filtering possible with a query?