Entity Centric Architecture

panda2004 · July 18, 2017, 8:35pm

I'm not clear on why the code that is setting "isDirty" is not also performing the percentage calculation but I suspect I'm missing something about the design.

Actually, you are right. Haven't thought about it. When I'm making the percentage calculation, the upsert script does have the REAL updated values right? After all, the document is re-indexed for each update

It's possible that the "entity" you are choosing to roll up might represent both an entity AND a time period e.g. Figures for Product X in month Y.

Actually, after reconsidering the problem I decided to try avoiding the pre-calculation of the bought and watched lists. I basically do want to create this entity index to stick relevant data together in each document so the data is much less spread out, but the calculation itself will be made in the query phase itself to get more control. This is described in other thread here. Hope it would also work.

Remember our heritage is search and it would not be unusual to index a document with a thousand words in it. This should be manageable. Watch for the outlier with a million IDs though and have a policy for dealing with that (reject vs truncate).

I would probably re-index the entity index every month, meaning I will use the scroll API in order to index products which have been relevant in the last 2 months, and only within each documents only actions which are still relevant.

Topic		Replies	Views
Entity centric indices Elasticsearch	2	650	July 5, 2017
Elasticsearch sequence pattern mining Elasticsearch	4	2442	July 12, 2017
Entity-centric indexing with Transforms Elasticsearch transforms	5	1364	August 4, 2021
Best method of handling arbitrary document joins Elasticsearch	9	1295	June 1, 2018
Entity-Centric Indexing - reliability and performance Elasticsearch	16	3462	August 16, 2017

Entity Centric Architecture

Related topics