# Linear regression to forecast demand

(clandestino_bgd-2) #1

Hello,
I am trying to come up with optimal way to forecast the purchase time for Product P and User U.
We currently index these events in ES (pushed by e-commerce system)

orderId,user,product,quantity,time,days
"order1","U","P",1,"2017-01-01",17167
"order2","U","P",2,"2017-01-29",17195
"order3","U","P",3,"2017-04-02",17258
"order4","U","P",1,"2017-07-06",17353
"order5","U","P",2,"2017-08-03",17381

where days is just a integer showing number of days since 1.1.1970 for the event time.

What I want is to predict is the next time of purchase and the quantity.
quantity is last purchased quantity in this case 2
and the forecast time should be somewhere in october.

I've played with linear regression plugin for ES:

and it works well if I have additional calculated field "lag" for each event which denotes the time period until NEXT purchase, so then the data above should look like:

orderId,user,product,quantity,time,days,lag
"order1","U","P",1,"2017-01-01",17167,28
"order2","U","P",2,"2017-01-29",17195,63
"order3","U","P",3,"2017-04-02",17258,95
"order4","U","P",1,"2017-07-06",17353,29
"order5","U","P",2,"2017-08-04",17382,?

Of course in this index I will have 10K of different products and 1M different Users.
My first question is how to update this field for the LAST event when new event comes in?
Is it possible to do it in index time?
Does this make sense at all or there is a better way?
Btw in case that there is only one purchase, I'd use the default lifecycle of the product in days (comes from e-commerce as well). But for cases where there is a buying pattern (at least 2 events) I 'd need to use user specific data.
I plan to run forecast query for each User/Product pair every hour to calculate the next forecast time (effectively when user SHOULD run out of supply).
What would be the way to optimize that (avoid doing this one by one)?