River to elasticSearch based on TTL


(vineeth mohan) #1

Hi ,

If i got it right , with the new TTL feature , a timeStamp of creation would
be tagged with each document.
If this is right cant we create a river to elasticSearch based on this.

That is each time we poll elasticSearch for changes , we will ask it like
"provide me all changes since this timeStamp" .
There might be some overlapping (very few) documents for that milliSecond
but then we can discard it at the river code.

Thanks
Vineeth


(David Pilato) #2

If you are right, it's a very good idea !

David :wink:

Le 27 oct. 2011 à 16:23, Vineeth Mohan vineethmohan@algotree.com a écrit :

Hi ,

If i got it right , with the new TTL feature , a timeStamp of creation would be tagged with each document.
If this is right cant we create a river to elasticSearch based on this.

That is each time we poll elasticSearch for changes , we will ask it like "provide me all changes since this timeStamp" .
There might be some overlapping (very few) documents for that milliSecond but then we can discard it at the river code.

Thanks
Vineeth


(Benjamin Devèze) #3

Hi,

if you don't provide a timestamp when indexing/updating a document and you
enable timestamp in the mapping then each document will be associated with
a timestamp corresponding to indexing time.
The timestamp field is indexed by default and can be queried as a date
field if I remember well. So if you have a synchronized time on your
cluster you are able to query all the docs that have
changed since a specified moment.

So I guess it could work.


(vineeth mohan) #4

We dont have to synchronize the time.

The endpoint (in this case ES) will have to remember which timeStamp did it
fetch it last time.
Then do a "get me all since this timestamp" query.

Thanks
Vineeth

On Thu, Oct 27, 2011 at 8:03 PM, Benjamin Devèze
benjamin.deveze@gmail.comwrote:

Hi,

if you don't provide a timestamp when indexing/updating a document and you
enable timestamp in the mapping then each document will be associated with a
timestamp corresponding to indexing time.
The timestamp field is indexed by default and can be queried as a date
field if I remember well. So if you have a synchronized time on your cluster
you are able to query all the docs that have
changed since a specified moment.

So I guess it could work.


(Benjamin Devèze) #5

If mean if you want a "get me all since this timestamp" query on a
distributed environment and you rely on automatic timestamp of ES you have
to make sure that all your nodes have the same time.


(vineeth mohan) #6

oh ok.... yes that makes sense.

But if some work around is not there in the current implementation of
deletion , how are things like "delete a document based on TTL even if a
node was taken out the cluster and a new one is put back and the new one is
not time synced up with first one" handleled ?

Thanks
Vineeth

On Thu, Oct 27, 2011 at 8:58 PM, Benjamin Devèze
benjamin.deveze@gmail.comwrote:

If mean if you want a "get me all since this timestamp" query on a
distributed environment and you rely on automatic timestamp of ES you have
to make sure that all your nodes have the same time.


(Benjamin Devèze) #7

The TTL feature assume that the time is synced up cluster wide


(vineeth mohan) #8

that means that if the time is not synced up and if some reordering happens
we might loose a document before it "actually" expires.
Looks like a bug to me.

Thanks
Vineeth

On Thu, Oct 27, 2011 at 9:20 PM, Benjamin Devèze
benjamin.deveze@gmail.comwrote:

The TTL feature assume that the time is synced up cluster wide


(Electric Owl) #9

On Oct 27, 5:23 pm, Vineeth Mohan vineethmo...@algotree.com wrote:

that means that if the time is not synced up and if some reordering happens
we might loose a document before it "actually" expires.
Looks like a bug to me.

Presumably if you are operating a cluster you will have all the
servers getting their time from the same NTP server and syncing on a
regular basis.

I couldn't say whether it is a bug or not but maybe you need to
consider your system/network architecture first before declaring it
so.

Ian

Thanks
Vineeth

On Thu, Oct 27, 2011 at 9:20 PM, Benjamin Devèze
benjamin.dev...@gmail.comwrote:

The TTL feature assume that the time is synced up cluster wide


(vineeth mohan) #10

I have operates such large scale systems which are backed by NTP.
I have also seen NTP deomen going down without coming back too.
Also it was kinda common that some of the newly added machines didn't have
NTP running on it. (yes , even after using
configuration engines like cfEngine , etch etc)
As the risk here is deletion rather than some other trivial issues , i had
this concern.

Thanks
Vineeth

On Thu, Oct 27, 2011 at 10:08 PM, Electric Owl ian.lewis.65@gmail.comwrote:

On Oct 27, 5:23 pm, Vineeth Mohan vineethmo...@algotree.com wrote:

that means that if the time is not synced up and if some reordering
happens
we might loose a document before it "actually" expires.
Looks like a bug to me.

Presumably if you are operating a cluster you will have all the
servers getting their time from the same NTP server and syncing on a
regular basis.

I couldn't say whether it is a bug or not but maybe you need to
consider your system/network architecture first before declaring it
so.

Ian

Thanks
Vineeth

On Thu, Oct 27, 2011 at 9:20 PM, Benjamin Devèze
benjamin.dev...@gmail.comwrote:

The TTL feature assume that the time is synced up cluster wide


(vineeth mohan) #11

To add to this , i have also seen time moving at a faster rates in a
virtual machine (yes it was a bug in that virtual machine creation
application)

Thanks
Vineeth

On Thu, Oct 27, 2011 at 10:18 PM, Vineeth Mohan
vineethmohan@algotree.comwrote:

I have operates such large scale systems which are backed by NTP.
I have also seen NTP deomen going down without coming back too.
Also it was kinda common that some of the newly added machines didn't have
NTP running on it. (yes , even after using
configuration engines like cfEngine , etch etc)
As the risk here is deletion rather than some other trivial issues , i had
this concern.

Thanks
Vineeth

On Thu, Oct 27, 2011 at 10:08 PM, Electric Owl ian.lewis.65@gmail.comwrote:

On Oct 27, 5:23 pm, Vineeth Mohan vineethmo...@algotree.com wrote:

that means that if the time is not synced up and if some reordering
happens
we might loose a document before it "actually" expires.
Looks like a bug to me.

Presumably if you are operating a cluster you will have all the
servers getting their time from the same NTP server and syncing on a
regular basis.

I couldn't say whether it is a bug or not but maybe you need to
consider your system/network architecture first before declaring it
so.

Ian

Thanks
Vineeth

On Thu, Oct 27, 2011 at 9:20 PM, Benjamin Devèze
benjamin.dev...@gmail.comwrote:

The TTL feature assume that the time is synced up cluster wide


(system) #12