Aggregate latest values of documents

I have items that over the time can go from one category to the other.
Each change results in a new document with current state of the item.

I'm tying to get run basic analytics based on the latest state of the item like how many of them are in certain category or sum of values.

Basically I'm trying to collapse and aggregate results. But that's not possible.

Using PPL it's super simple, but does not scale, say few million docs.

search source=test-stats
| sort -timestamp
| dedup 1 id
| stats sum(value) as sum_value, COUNT(id) by category

My test docs:

{"_id": "doc-1-1", "id": "doc-1", "category": "c1", "value": 1010.0, "timestamp": "2023-07-07T00:00:00.000Z"}
{"_id": "doc-1-2", "id": "doc-1", "category": "c1", "value": 1012.0, "timestamp": "2023-07-08T00:00:00.000Z"}
{"_id": "doc-1-3", "id": "doc-1", "category": "c2", "value": 1013.0, "timestamp": "2023-07-09T00:00:00.000Z"}
{"_id": "doc-2-1", "id": "doc-2", "category": "c2", "value": 1021.0, "timestamp": "2023-07-07T00:00:00.000Z"}
{"_id": "doc-2-2", "id": "doc-2", "category": "c3", "value": 1022.0, "timestamp": "2023-07-10T00:00:00.000Z"}
{"_id": "doc-2-3", "id": "doc-2", "category": "c4", "value": 1023.0, "timestamp": "2023-07-11T00:00:00.000Z"}
{"_id": "doc-3-1", "id": "doc-3", "category": "c2", "value": 1031.0, "timestamp": "2023-07-07T00:00:00.000Z"}
{"_id": "doc-3-2", "id": "doc-3", "category": "c2", "value": 1032.0, "timestamp": "2023-07-10T00:00:00.000Z"}
{"_id": "doc-3-3", "id": "doc-3", "category": "c2", "value": 1033.0, "timestamp": "2023-07-11T00:00:00.000Z"}

Expected result would be:

sum_value  COUNT(id)  category
1023       1          c4
2046       2          c2

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.