River to elasticSearch based on TTL

Hi ,

If i got it right , with the new TTL feature , a timeStamp of creation would
be tagged with each document.
If this is right cant we create a river to elasticSearch based on this.

That is each time we poll elasticSearch for changes , we will ask it like
"provide me all changes since this timeStamp" .
There might be some overlapping (very few) documents for that milliSecond
but then we can discard it at the river code.

Thanks
Vineeth

If you are right, it's a very good idea !

David :wink:

Le 27 oct. 2011 à 16:23, Vineeth Mohan vineethmohan@algotree.com a écrit :

Hi ,

If i got it right , with the new TTL feature , a timeStamp of creation would be tagged with each document.
If this is right cant we create a river to elasticSearch based on this.

That is each time we poll elasticSearch for changes , we will ask it like "provide me all changes since this timeStamp" .
There might be some overlapping (very few) documents for that milliSecond but then we can discard it at the river code.

Thanks
Vineeth

Hi,

if you don't provide a timestamp when indexing/updating a document and you
enable timestamp in the mapping then each document will be associated with
a timestamp corresponding to indexing time.
The timestamp field is indexed by default and can be queried as a date
field if I remember well. So if you have a synchronized time on your
cluster you are able to query all the docs that have
changed since a specified moment.

So I guess it could work.

We dont have to synchronize the time.

The endpoint (in this case ES) will have to remember which timeStamp did it
fetch it last time.
Then do a "get me all since this timestamp" query.

Thanks
Vineeth

On Thu, Oct 27, 2011 at 8:03 PM, Benjamin Devèze
benjamin.deveze@gmail.comwrote:

Hi,

if you don't provide a timestamp when indexing/updating a document and you
enable timestamp in the mapping then each document will be associated with a
timestamp corresponding to indexing time.
The timestamp field is indexed by default and can be queried as a date
field if I remember well. So if you have a synchronized time on your cluster
you are able to query all the docs that have
changed since a specified moment.

So I guess it could work.

If mean if you want a "get me all since this timestamp" query on a
distributed environment and you rely on automatic timestamp of ES you have
to make sure that all your nodes have the same time.

oh ok.... yes that makes sense.

But if some work around is not there in the current implementation of
deletion , how are things like "delete a document based on TTL even if a
node was taken out the cluster and a new one is put back and the new one is
not time synced up with first one" handleled ?

Thanks
Vineeth

On Thu, Oct 27, 2011 at 8:58 PM, Benjamin Devèze
benjamin.deveze@gmail.comwrote:

If mean if you want a "get me all since this timestamp" query on a
distributed environment and you rely on automatic timestamp of ES you have
to make sure that all your nodes have the same time.

The TTL feature assume that the time is synced up cluster wide

that means that if the time is not synced up and if some reordering happens
we might loose a document before it "actually" expires.
Looks like a bug to me.

Thanks
Vineeth

On Thu, Oct 27, 2011 at 9:20 PM, Benjamin Devèze
benjamin.deveze@gmail.comwrote:

The TTL feature assume that the time is synced up cluster wide

On Oct 27, 5:23 pm, Vineeth Mohan vineethmo...@algotree.com wrote:

that means that if the time is not synced up and if some reordering happens
we might loose a document before it "actually" expires.
Looks like a bug to me.

Presumably if you are operating a cluster you will have all the
servers getting their time from the same NTP server and syncing on a
regular basis.

I couldn't say whether it is a bug or not but maybe you need to
consider your system/network architecture first before declaring it
so.

Ian

Thanks
Vineeth

On Thu, Oct 27, 2011 at 9:20 PM, Benjamin Devèze
benjamin.dev...@gmail.comwrote:

The TTL feature assume that the time is synced up cluster wide

I have operates such large scale systems which are backed by NTP.
I have also seen NTP deomen going down without coming back too.
Also it was kinda common that some of the newly added machines didn't have
NTP running on it. (yes , even after using
configuration engines like cfEngine , etch etc)
As the risk here is deletion rather than some other trivial issues , i had
this concern.

Thanks
Vineeth

On Thu, Oct 27, 2011 at 10:08 PM, Electric Owl ian.lewis.65@gmail.comwrote:

On Oct 27, 5:23 pm, Vineeth Mohan vineethmo...@algotree.com wrote:

that means that if the time is not synced up and if some reordering
happens
we might loose a document before it "actually" expires.
Looks like a bug to me.

Presumably if you are operating a cluster you will have all the
servers getting their time from the same NTP server and syncing on a
regular basis.

I couldn't say whether it is a bug or not but maybe you need to
consider your system/network architecture first before declaring it
so.

Ian

Thanks
Vineeth

On Thu, Oct 27, 2011 at 9:20 PM, Benjamin Devèze
benjamin.dev...@gmail.comwrote:

The TTL feature assume that the time is synced up cluster wide

To add to this , i have also seen time moving at a faster rates in a
virtual machine (yes it was a bug in that virtual machine creation
application)

Thanks
Vineeth

On Thu, Oct 27, 2011 at 10:18 PM, Vineeth Mohan
vineethmohan@algotree.comwrote:

I have operates such large scale systems which are backed by NTP.
I have also seen NTP deomen going down without coming back too.
Also it was kinda common that some of the newly added machines didn't have
NTP running on it. (yes , even after using
configuration engines like cfEngine , etch etc)
As the risk here is deletion rather than some other trivial issues , i had
this concern.

Thanks
Vineeth

On Thu, Oct 27, 2011 at 10:08 PM, Electric Owl ian.lewis.65@gmail.comwrote:

On Oct 27, 5:23 pm, Vineeth Mohan vineethmo...@algotree.com wrote:

that means that if the time is not synced up and if some reordering
happens
we might loose a document before it "actually" expires.
Looks like a bug to me.

Presumably if you are operating a cluster you will have all the
servers getting their time from the same NTP server and syncing on a
regular basis.

I couldn't say whether it is a bug or not but maybe you need to
consider your system/network architecture first before declaring it
so.

Ian

Thanks
Vineeth

On Thu, Oct 27, 2011 at 9:20 PM, Benjamin Devèze
benjamin.dev...@gmail.comwrote:

The TTL feature assume that the time is synced up cluster wide