If i got it right , with the new TTL feature , a timeStamp of creation would
be tagged with each document.
If this is right cant we create a river to elasticSearch based on this.
That is each time we poll elasticSearch for changes , we will ask it like
"provide me all changes since this timeStamp" .
There might be some overlapping (very few) documents for that milliSecond
but then we can discard it at the river code.
If i got it right , with the new TTL feature , a timeStamp of creation would be tagged with each document.
If this is right cant we create a river to elasticSearch based on this.
That is each time we poll elasticSearch for changes , we will ask it like "provide me all changes since this timeStamp" .
There might be some overlapping (very few) documents for that milliSecond but then we can discard it at the river code.
if you don't provide a timestamp when indexing/updating a document and you
enable timestamp in the mapping then each document will be associated with
a timestamp corresponding to indexing time.
The timestamp field is indexed by default and can be queried as a date
field if I remember well. So if you have a synchronized time on your
cluster you are able to query all the docs that have
changed since a specified moment.
if you don't provide a timestamp when indexing/updating a document and you
enable timestamp in the mapping then each document will be associated with a
timestamp corresponding to indexing time.
The timestamp field is indexed by default and can be queried as a date
field if I remember well. So if you have a synchronized time on your cluster
you are able to query all the docs that have
changed since a specified moment.
If mean if you want a "get me all since this timestamp" query on a
distributed environment and you rely on automatic timestamp of ES you have
to make sure that all your nodes have the same time.
But if some work around is not there in the current implementation of
deletion , how are things like "delete a document based on TTL even if a
node was taken out the cluster and a new one is put back and the new one is
not time synced up with first one" handleled ?
If mean if you want a "get me all since this timestamp" query on a
distributed environment and you rely on automatic timestamp of ES you have
to make sure that all your nodes have the same time.
that means that if the time is not synced up and if some reordering happens
we might loose a document before it "actually" expires.
Looks like a bug to me.
that means that if the time is not synced up and if some reordering happens
we might loose a document before it "actually" expires.
Looks like a bug to me.
Presumably if you are operating a cluster you will have all the
servers getting their time from the same NTP server and syncing on a
regular basis.
I couldn't say whether it is a bug or not but maybe you need to
consider your system/network architecture first before declaring it
so.
I have operates such large scale systems which are backed by NTP.
I have also seen NTP deomen going down without coming back too.
Also it was kinda common that some of the newly added machines didn't have
NTP running on it. (yes , even after using
configuration engines like cfEngine , etch etc)
As the risk here is deletion rather than some other trivial issues , i had
this concern.
that means that if the time is not synced up and if some reordering
happens
we might loose a document before it "actually" expires.
Looks like a bug to me.
Presumably if you are operating a cluster you will have all the
servers getting their time from the same NTP server and syncing on a
regular basis.
I couldn't say whether it is a bug or not but maybe you need to
consider your system/network architecture first before declaring it
so.
I have operates such large scale systems which are backed by NTP.
I have also seen NTP deomen going down without coming back too.
Also it was kinda common that some of the newly added machines didn't have
NTP running on it. (yes , even after using
configuration engines like cfEngine , etch etc)
As the risk here is deletion rather than some other trivial issues , i had
this concern.
that means that if the time is not synced up and if some reordering
happens
we might loose a document before it "actually" expires.
Looks like a bug to me.
Presumably if you are operating a cluster you will have all the
servers getting their time from the same NTP server and syncing on a
regular basis.
I couldn't say whether it is a bug or not but maybe you need to
consider your system/network architecture first before declaring it
so.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.