Filter each bucket based on previous buckets

Hello,
I have documents representing user requests, each document contains user_id and timestamp.
A simple date histogram aggregation generates the number of daily users like this

{
"aggs": {
"2": {
"date_histogram": {
"field": "@timestamp",
"interval": "1d",
"min_doc_count": 0
},
"aggs": {
"1": {
"cardinality": {
"field": "user_id.keyword"
}
}
}
}
},
"size": 0
}

This will generate a graph for total daily users.
I want to make graphs for new users and returning users.
For new users : Filter the buckets to not count documents which contains user_id that is already counted in a previous bucket.
For returning users : Filter the buckets to count only documents which contains user_id that is already counted in a previous bucket.
How can I do this ?

“New users” on an event-centric index ticks all the boxes for problems:

If you create an entity centric index for users which holds the date of the first sighting you can run a “new users” efficiently with a simple date histogram on that firstSighted field.
To get “returning users” hit your event store with your example query that had cardinalities and subtract the “new users” value for each day (this may require work in your client)

1 Like

Thank you for your reply @Mark_Harwood, the entity centric index approach is the solution.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.