Multiple writes document strategy

avibh · March 20, 2016, 3:51pm

Hi,

I have the following architecture dilemma -
my requirements are to index a lot of data which is mostly a single write.
document are "tagged" on index as types X/Y/Z...
this means the system adds some metadata on each document at index time.
unfortunately users can also edit the document tagging manually, which may reside in some reprocessing of the original data and re-indexing of multiple/many documents.
the system has a steady stream of input data, it indexes all the time while the users can temper with the data also.
up until now we had the "tagging" metadata embedded in each document, which caused a lot of reprocessing upon user related changes.

we're thinking of separating the tag metadata from the documents, but that will result in a semi SQL like DB (document will have a reference to the metadata).
so, my question is -
what is the best design pattern here to get the best of ES (aggregations, search, geo....) without falling into the SQL like trap too deep?

jprante · March 20, 2016, 4:02pm

Index the events as they come in. Avoid reindexing at all cost. A filter query to combine all events back into a "tag list" at query time is cheap.

What I don't understand is "SQL like trap". There is no trap. You must design relationships between your entity definition and the entity in spacetime, i.e. events that change the entity by appending information. That is not SQL specific.

Topic		Replies	Views
ES indexing strategy Elasticsearch	4	3087	July 5, 2017
Best way to tag a large amount of documents Elasticsearch	4	435	November 13, 2020
Multiple types or multiple indexes? Best practices Elasticsearch	2	628	July 6, 2018
Update document on multiple indices Logstash	2	948	May 3, 2017
How to structure multi site documents in elasticsearch? Elasticsearch	5	589	July 5, 2017

Multiple writes document strategy

Related topics