River to elasticSearch based on TTL

vineeth_mohan · October 27, 2011, 2:23pm

Hi ,

If i got it right , with the new TTL feature , a timeStamp of creation would
be tagged with each document.
If this is right cant we create a river to elasticSearch based on this.

That is each time we poll elasticSearch for changes , we will ask it like
"provide me all changes since this timeStamp" .
There might be some overlapping (very few) documents for that milliSecond
but then we can discard it at the river code.

Thanks
Vineeth

dadoonet · October 27, 2011, 2:28pm

If you are right, it's a very good idea !

David

Le 27 oct. 2011 à 16:23, Vineeth Mohan vineethmohan@algotree.com a écrit :

Hi ,

If i got it right , with the new TTL feature , a timeStamp of creation would be tagged with each document.
If this is right cant we create a river to elasticSearch based on this.

That is each time we poll elasticSearch for changes , we will ask it like "provide me all changes since this timeStamp" .
There might be some overlapping (very few) documents for that milliSecond but then we can discard it at the river code.

Thanks
Vineeth

Benjamin_Deveze · October 27, 2011, 2:33pm

Hi,

if you don't provide a timestamp when indexing/updating a document and you
enable timestamp in the mapping then each document will be associated with
a timestamp corresponding to indexing time.
The timestamp field is indexed by default and can be queried as a date
field if I remember well. So if you have a synchronized time on your
cluster you are able to query all the docs that have
changed since a specified moment.

So I guess it could work.

vineeth_mohan · October 27, 2011, 2:46pm

We dont have to synchronize the time.

The endpoint (in this case ES) will have to remember which timeStamp did it
fetch it last time.
Then do a "get me all since this timestamp" query.

Thanks
Vineeth

On Thu, Oct 27, 2011 at 8:03 PM, Benjamin Devèze
benjamin.deveze@gmail.comwrote:

Hi,

if you don't provide a timestamp when indexing/updating a document and you
enable timestamp in the mapping then each document will be associated with a
timestamp corresponding to indexing time.
The timestamp field is indexed by default and can be queried as a date
field if I remember well. So if you have a synchronized time on your cluster
you are able to query all the docs that have
changed since a specified moment.

So I guess it could work.

Benjamin_Deveze · October 27, 2011, 3:28pm

If mean if you want a "get me all since this timestamp" query on a
distributed environment and you rely on automatic timestamp of ES you have
to make sure that all your nodes have the same time.

vineeth_mohan · October 27, 2011, 3:32pm

oh ok.... yes that makes sense.

But if some work around is not there in the current implementation of
deletion , how are things like "delete a document based on TTL even if a
node was taken out the cluster and a new one is put back and the new one is
not time synced up with first one" handleled ?

Thanks
Vineeth

On Thu, Oct 27, 2011 at 8:58 PM, Benjamin Devèze
benjamin.deveze@gmail.comwrote:

If mean if you want a "get me all since this timestamp" query on a
distributed environment and you rely on automatic timestamp of ES you have
to make sure that all your nodes have the same time.

Benjamin_Deveze · October 27, 2011, 3:50pm

The TTL feature assume that the time is synced up cluster wide

vineeth_mohan · October 27, 2011, 4:23pm

that means that if the time is not synced up and if some reordering happens
we might loose a document before it "actually" expires.
Looks like a bug to me.

Thanks
Vineeth

On Thu, Oct 27, 2011 at 9:20 PM, Benjamin Devèze
benjamin.deveze@gmail.comwrote:

The TTL feature assume that the time is synced up cluster wide

Electric_Owl · October 27, 2011, 4:38pm

On Oct 27, 5:23 pm, Vineeth Mohan vineethmo...@algotree.com wrote:

that means that if the time is not synced up and if some reordering happens
we might loose a document before it "actually" expires.
Looks like a bug to me.

Presumably if you are operating a cluster you will have all the
servers getting their time from the same NTP server and syncing on a
regular basis.

I couldn't say whether it is a bug or not but maybe you need to
consider your system/network architecture first before declaring it
so.

Ian

Thanks
Vineeth

On Thu, Oct 27, 2011 at 9:20 PM, Benjamin Devèze
benjamin.dev...@gmail.comwrote:

The TTL feature assume that the time is synced up cluster wide

vineeth_mohan · October 27, 2011, 4:48pm

I have operates such large scale systems which are backed by NTP.
I have also seen NTP deomen going down without coming back too.
Also it was kinda common that some of the newly added machines didn't have
NTP running on it. (yes , even after using
configuration engines like cfEngine , etch etc)
As the risk here is deletion rather than some other trivial issues , i had
this concern.

Thanks
Vineeth

On Thu, Oct 27, 2011 at 10:08 PM, Electric Owl ian.lewis.65@gmail.comwrote:

On Oct 27, 5:23 pm, Vineeth Mohan vineethmo...@algotree.com wrote:

that means that if the time is not synced up and if some reordering
happens
we might loose a document before it "actually" expires.
Looks like a bug to me.

Presumably if you are operating a cluster you will have all the
servers getting their time from the same NTP server and syncing on a
regular basis.

I couldn't say whether it is a bug or not but maybe you need to
consider your system/network architecture first before declaring it
so.

Ian

Thanks
Vineeth

On Thu, Oct 27, 2011 at 9:20 PM, Benjamin Devèze
benjamin.dev...@gmail.comwrote:

The TTL feature assume that the time is synced up cluster wide

vineeth_mohan · October 27, 2011, 4:53pm

To add to this , i have also seen time moving at a faster rates in a
virtual machine (yes it was a bug in that virtual machine creation
application)

Thanks
Vineeth

On Thu, Oct 27, 2011 at 10:18 PM, Vineeth Mohan
vineethmohan@algotree.comwrote:

I have operates such large scale systems which are backed by NTP.
I have also seen NTP deomen going down without coming back too.
Also it was kinda common that some of the newly added machines didn't have
NTP running on it. (yes , even after using
configuration engines like cfEngine , etch etc)
As the risk here is deletion rather than some other trivial issues , i had
this concern.

Thanks
Vineeth

On Thu, Oct 27, 2011 at 10:08 PM, Electric Owl ian.lewis.65@gmail.comwrote:

On Oct 27, 5:23 pm, Vineeth Mohan vineethmo...@algotree.com wrote:

that means that if the time is not synced up and if some reordering
happens
we might loose a document before it "actually" expires.
Looks like a bug to me.

Presumably if you are operating a cluster you will have all the
servers getting their time from the same NTP server and syncing on a
regular basis.

I couldn't say whether it is a bug or not but maybe you need to
consider your system/network architecture first before declaring it
so.

Ian

Thanks
Vineeth

On Thu, Oct 27, 2011 at 9:20 PM, Benjamin Devèze
benjamin.dev...@gmail.comwrote:

The TTL feature assume that the time is synced up cluster wide

Topic		Replies	Views
Getting error when TTL is executed Elasticsearch	13	674	July 6, 2017
TTL for documents Elasticsearch	9	1776	July 6, 2017
Per document TTL support added in master branch Elasticsearch	1	305	July 6, 2017
ES consistently giving not able to parse _ttl exception Elasticsearch	12	1080	July 6, 2017
MongoDb & ES: Where can I store timestamp for last updated document in ES? Elasticsearch	2	1736	July 6, 2017

River to elasticSearch based on TTL

Related topics