Multiple writes document strategy

(avibh) #1


I have the following architecture dilemma -
my requirements are to index a lot of data which is mostly a single write.
document are "tagged" on index as types X/Y/Z...
this means the system adds some metadata on each document at index time.
unfortunately users can also edit the document tagging manually, which may reside in some reprocessing of the original data and re-indexing of multiple/many documents.
the system has a steady stream of input data, it indexes all the time while the users can temper with the data also.
up until now we had the "tagging" metadata embedded in each document, which caused a lot of reprocessing upon user related changes.

we're thinking of separating the tag metadata from the documents, but that will result in a semi SQL like DB (document will have a reference to the metadata).
so, my question is -
what is the best design pattern here to get the best of ES (aggregations, search, geo....) without falling into the SQL like trap too deep?

(Jörg Prante) #2

Index the events as they come in. Avoid reindexing at all cost. A filter query to combine all events back into a "tag list" at query time is cheap.

What I don't understand is "SQL like trap". There is no trap. You must design relationships between your entity definition and the entity in spacetime, i.e. events that change the entity by appending information. That is not SQL specific.

(system) #3