I have a users index. Users have various status "New", "Waiting", "Completed". A status can go from "Completed" to "New" again. So in time series it would look something like
user_id, status, timestamp
1 NEW 2023-06-30 5:00
1 WAITING 2023-06-30 6:00
1 COMPLETED 2023-06-30 7:00
1 NEW 2023-06-30 8:00
2 NEW 2023-06-30 5:00
2 WAITING 2023-06-30 6:00
2 COMPLETED 2023-06-30 7:00
I have millions of users. I would like to get number of users who are correctly in NEW state. The result should be 1 but with pivot aggregations I get count = 2 because userId:1 was COMPLETED at some point but then back to "NEW".
Latest transforms are too expensive usage wise, so we want to run these queries on demand. Is collapse or pipeline aggregations possible? Could someone help me here.