Entity Centric Architecture

I'm not clear on why the code that is setting "isDirty" is not also performing the percentage calculation but I suspect I'm missing something about the design.

Actually, you are right. Haven't thought about it. When I'm making the percentage calculation, the upsert script does have the REAL updated values right? After all, the document is re-indexed for each update

It's possible that the "entity" you are choosing to roll up might represent both an entity AND a time period e.g. Figures for Product X in month Y.

Actually, after reconsidering the problem I decided to try avoiding the pre-calculation of the bought and watched lists. I basically do want to create this entity index to stick relevant data together in each document so the data is much less spread out, but the calculation itself will be made in the query phase itself to get more control. This is described in other thread here. Hope it would also work.

Remember our heritage is search and it would not be unusual to index a document with a thousand words in it. This should be manageable. Watch for the outlier with a million IDs though and have a policy for dealing with that (reject vs truncate).

I would probably re-index the entity index every month, meaning I will use the scroll API in order to index products which have been relevant in the last 2 months, and only within each documents only actions which are still relevant.