A customer_account can order items any time, so raw data are like :
order_id
@timestamp
product_id
product_name
quantity
price
... etc
I need a final index where i can track on a daily basis the behavior of customers, so my final indices looks like :
customer_account
@timestamp (yyyy-MM-dd)
number_of_orders
total_prices
first_order_of_customer
My issue is with first_order_of_customer i can only have the first_order_of_customer in each day
Is there any way i can include the real first_order_of_customer in each day aggregated data ?
Example : customer 1000087 ordered for the first time in 2022-12-03
I need to have first_order_of_customer = 2022-12-03 in aggregated data of day 2022-12-13 (if customer ordered something in 2022-12-13)
With the date_histogram in group_by you cut off the necessary documents.
I can imagine having another transform just for updating a first_order_of_customer index and use enrich (using an ingest pipeline on dest) in your daily transform. Instead of enrich you could also have a 3rd transform that combines both, in this case however enrich seems simpler and sufficient.
Thanks @Hendrik_Muhs for your feedback
My only issue with enrich pipeline is that it require to execute the enrich strategy each time the enrich index is updated, in my case my enrich index (customer_account + first_order_of_customer) is dynamic and not static
I was thinking that the first_order_of_customer index might change rarely and it might be ok to re-create this lookup index daily.
In the transform where you have enrich, you could fallback to the first order in the bucket if enrich doesn't match anything as that indicates a new customer.
Anyway, the only other alternative I see is using another transform that merges both the first_order_of_customer and your daily_orders index into one.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.