I'm looking for the optimal way to track record additions or changes.
So, say I've got data coming in. This data has timestamps, but the
timestamps don't correspond with the insertion time, and there's no
expectation for them to remain ordered.
I need to dump "new" data periodically, say 15 minutes. There may be new
data in those 15 minutes, there may not.
Two approaches occur to me.
One: Reindex and include _timestamp as a part of the process. Grab all data
since last dump.
Two: Set a boolean value like "dumped" that, on index insertion and on all
current records, will be false. My dump app would just search everything
for dumped:false, dump it, and on success, set dump:true.
Is there another approach that's better? Is either option 1 or option 2
above preferable? The boolean option seems appealing since there's no need
to keep a track in the dump process of the last time a dump occurred. Maybe
I'll do both, since for cases where "Oh hey, data for time period XYZ->ABC
didn't dump properly!" it would be nice to have explicit timestamps of when
the data was present, but not need to rely on them for the dump process..
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to email@example.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/900b0e99-d4a2-4e5f-9408-d86198cd63b2%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.