document count ~10 billion documents
elastics disk usage ~2TB
elasticsearch 7.8 via NEST/C# api.
my model is a transaction receipt. a simplified schema
[keyword]
string customer_id { get; set; } //cardinality 100M
[keyword]
string employee_id { get; set; } //cardinality ~5M
[keyword]
string store_zip { get; set; } //cardinality 100k
[keyword]
List<string> codes { get; set; } //can be up to 25 - cardinality ~300k
For this phase I want counts of term aggregations per customer or employee. I believe a composite aggregation can get me this 'pivot' from transactions to customers or employee.
I have one use case that I can not figure out: Filter customers that have made 2 or more purchases of a particular set of codes.
As an example
Filter by customer that have made >=2 purchases with codes { "A1", "A2", A3" }
then
Aggregate by customer by top 4 zip codes
Any guidance would be helpful.
Thank you for your time in advance