I want to know how many payments in my environment are of a given status, at a given time broken down by direction, and protocol.
In our case, a payment is a collection of transactions (docs) with the same paymentId. The document with the most recent timestamp is considered to be the 'current' status.
Here's some samples of the documents
{
"status": "Pending",
"paymentId": "a1",
"@timestamp": "2021-09-08T14:10:01",
"direction": "Inbound",
"protocol": "SWF"
},
{
"status": "Exception",
"paymentId": "a1",
"@timestamp": "2021-09-08T15:10:01",
"direction": "Inbound",
"protocol": "SWF"
},
{
"status": "Completed",
"paymentId": "a1",
"@timestamp": "2021-09-08T16:10:01",
"direction": "Inbound",
"protocol": "SWF"
}
There might be more than 20,000+ payments of a specific type in a search window (as it is defined by business uses on demand).
Below is the aggregation I put together for this problem:
GET test-pmt/_search
{
"size": 0,
"aggs": {
"by_direction": {
"terms": {
"field": "direction.keyword",
"size": 2
},
"aggs": {
"by_protocol": {
"terms": {
"field": "protocol.keyword",
"size": 3
},
"aggs": {
"by_pmt": {
"terms": {
"field": "paymentId.keyword",
"size": 20000
},
"aggs": {
"by_recent_timestamp": {
"terms": {
"field": "@timestamp",
"size": 1,
"order": {
"_key": "desc"
}
},
"aggs": {
"by_status": {
"terms": {
"field": "status.keyword",
"size": 1
}
}
}
}
}
}
}
}
}
}
}
}
All of this comes down to a few questions:
- Is the 20000 bucket limit on a term aggregation for the entire payload, or just the local aggregation? (In other words, is the limit 20000, or is it parent buckets * child buckets * child sub-buckets?)
- Am I going about solving this problem the right way? This is a fairly common use-case, and while transforms will essentially dismiss some of the problems here, it doesn't allow for the variability of searching within different time windows.