Notifications from Elasticsearch when documents are added


(vineeth mohan) #1

In couchDB there is a service where any external application
interested in a particular document can hook to coucDB and if the sth
document is available , couchDB notifies the external application.

Is there any similar feature in elasticSearch ?
Basically i have another service running somewhere and it is
interested in documents added to elasticSearch.

Thanks
Vineeth


(David Pilato) #2

+1 for this feature if it doesn't exist yet !

David :wink:

Le 10 août 2011 à 08:16, Vineeth Mohan VineethMohan@algotree.com a écrit :

In couchDB there is a service where any external application
interested in a particular document can hook to coucDB and if the sth
document is available , couchDB notifies the external application.

Is there any similar feature in elasticSearch ?
Basically i have another service running somewhere and it is
interested in documents added to elasticSearch.

Thanks
Vineeth


(Shay Banon) #3

There isn't an option for that. How do you envision the notification, as a
pull or push service? With couchdb, do you refer to the _changes API?

On Wed, Aug 10, 2011 at 9:16 AM, Vineeth Mohan VineethMohan@algotree.comwrote:

In couchDB there is a service where any external application
interested in a particular document can hook to coucDB and if the sth
document is available , couchDB notifies the external application.

Is there any similar feature in elasticSearch ?
Basically i have another service running somewhere and it is
interested in documents added to elasticSearch.

Thanks
Vineeth


(vineeth mohan) #4

Yes , i mean the _change API.
(http://guide.couchdb.org/draft/notifications.html - Continious
notification)

I am sure this feature will help many ppl in their integration process.

Thanks
Vineeth

On Wed, Aug 10, 2011 at 3:51 PM, Shay Banon kimchy@gmail.com wrote:

There isn't an option for that. How do you envision the notification, as a
pull or push service? With couchdb, do you refer to the _changes API?

On Wed, Aug 10, 2011 at 9:16 AM, Vineeth Mohan VineethMohan@algotree.comwrote:

In couchDB there is a service where any external application
interested in a particular document can hook to coucDB and if the sth
document is available , couchDB notifies the external application.

Is there any similar feature in elasticSearch ?
Basically i have another service running somewhere and it is
interested in documents added to elasticSearch.

Thanks
Vineeth


(Shay Banon) #5

This feature is tricky to implement because elasticsearch is built quite
differently than couchdb, also, couchdb has it easy since its a single node.

On Wed, Aug 10, 2011 at 1:28 PM, Vineeth Mohan vineethmohan@algotree.comwrote:

Yes , i mean the _change API.
(http://guide.couchdb.org/draft/notifications.html - Continious
notification)

I am sure this feature will help many ppl in their integration process.

Thanks
Vineeth

On Wed, Aug 10, 2011 at 3:51 PM, Shay Banon kimchy@gmail.com wrote:

There isn't an option for that. How do you envision the notification, as a
pull or push service? With couchdb, do you refer to the _changes API?

On Wed, Aug 10, 2011 at 9:16 AM, Vineeth Mohan <VineethMohan@algotree.com

wrote:

In couchDB there is a service where any external application
interested in a particular document can hook to coucDB and if the sth
document is available , couchDB notifies the external application.

Is there any similar feature in elasticSearch ?
Basically i have another service running somewhere and it is
interested in documents added to elasticSearch.

Thanks
Vineeth


(vineeth mohan) #6

If someone can give tips and clue on how o implement this , i am willing to
make a open source plugin out of this.

Thanks
Vineeth

On Wed, Aug 10, 2011 at 3:59 PM, Shay Banon kimchy@gmail.com wrote:

This feature is tricky to implement because elasticsearch is built quite
differently than couchdb, also, couchdb has it easy since its a single node.

On Wed, Aug 10, 2011 at 1:28 PM, Vineeth Mohan vineethmohan@algotree.comwrote:

Yes , i mean the _change API.
(http://guide.couchdb.org/draft/notifications.html - Continious
notification)

I am sure this feature will help many ppl in their integration process.

Thanks
Vineeth

On Wed, Aug 10, 2011 at 3:51 PM, Shay Banon kimchy@gmail.com wrote:

There isn't an option for that. How do you envision the notification, as
a pull or push service? With couchdb, do you refer to the _changes API?

On Wed, Aug 10, 2011 at 9:16 AM, Vineeth Mohan <
VineethMohan@algotree.com> wrote:

In couchDB there is a service where any external application
interested in a particular document can hook to coucDB and if the sth
document is available , couchDB notifies the external application.

Is there any similar feature in elasticSearch ?
Basically i have another service running somewhere and it is
interested in documents added to elasticSearch.

Thanks
Vineeth


(Mahendra M) #7

Hi Vineeth,

Would it be better to do this feature outside of ElasticSearch ?

I guess you already have a service which keeps adding docs to
ElasticSearch. You can add a hook here to notify your other systems. I
think this is better because in most cases you will not be using ES as
a main data store.

Regards,
Mahendra

On Wed, Aug 10, 2011 at 11:46 AM, Vineeth Mohan
VineethMohan@algotree.com wrote:

In couchDB there is a service where any external application
interested in a particular document can hook to coucDB and if the sth
document is available , couchDB notifies the external application.

Is there any similar feature in elasticSearch ?
Basically i have another service running somewhere and it is
interested in documents added to elasticSearch.

Thanks
Vineeth

--
Mahendra

http://twitter.com/mahendra


(Mahendra M) #8

Hi Shay,

CouchDB has distributed node support via BigCouch. They also have a
_changes interface which can be used for getting notifications on an
update to any node.

Regards,
Mahendra

On Wed, Aug 10, 2011 at 3:59 PM, Shay Banon kimchy@gmail.com wrote:

This feature is tricky to implement because elasticsearch is built quite
differently than couchdb, also, couchdb has it easy since its a single node.

On Wed, Aug 10, 2011 at 1:28 PM, Vineeth Mohan vineethmohan@algotree.com
wrote:

Yes , i mean the _change API.
(http://guide.couchdb.org/draft/notifications.html - Continious
notification)

I am sure this feature will help many ppl in their integration process.

Thanks
Vineeth

On Wed, Aug 10, 2011 at 3:51 PM, Shay Banon kimchy@gmail.com wrote:

There isn't an option for that. How do you envision the notification, as
a pull or push service? With couchdb, do you refer to the _changes API?

On Wed, Aug 10, 2011 at 9:16 AM, Vineeth Mohan
VineethMohan@algotree.com wrote:

In couchDB there is a service where any external application
interested in a particular document can hook to coucDB and if the sth
document is available , couchDB notifies the external application.

Is there any similar feature in elasticSearch ?
Basically i have another service running somewhere and it is
interested in documents added to elasticSearch.

Thanks
Vineeth

--
Mahendra

http://twitter.com/mahendra


(vineeth mohan) #9

Hello Mahendra ,

Thanks for your reply.
I am planning to use elasticSeach as the main data source.
I initially tot of using couchDB with ES.
But then i found that using map-attachment plugin , i can index PDF
file (and most of the widely used formats).
On configuring ES to persist the attachment field , i am able to store
and retrieve the documents i need in ES itself.
I don't see why i should use couchDB+ES when i can store , index and
retrieve document in ES itself.

Thanks
Vineeth

On Aug 10, 7:04 pm, Mahendra M mahendr...@gmail.com wrote:

Hi Shay,

CouchDB has distributed node support via BigCouch. They also have a
_changes interface which can be used for getting notifications on an
update to any node.

Regards,
Mahendra

On Wed, Aug 10, 2011 at 3:59 PM, Shay Banon kim...@gmail.com wrote:

This feature is tricky to implement because elasticsearch is built quite
differently than couchdb, also, couchdb has it easy since its a single node.

On Wed, Aug 10, 2011 at 1:28 PM, Vineeth Mohan vineethmo...@algotree.com
wrote:

Yes , i mean the _change API.
(http://guide.couchdb.org/draft/notifications.html- Continious
notification)

I am sure this feature will help many ppl in their integration process.

Thanks
Vineeth

On Wed, Aug 10, 2011 at 3:51 PM, Shay Banon kim...@gmail.com wrote:

There isn't an option for that. How do you envision the notification, as
a pull or push service? With couchdb, do you refer to the _changes API?

On Wed, Aug 10, 2011 at 9:16 AM, Vineeth Mohan
VineethMo...@algotree.com wrote:

In couchDB there is a service where any external application
interested in a particular document can hook to coucDB and if the sth
document is available , couchDB notifies the external application.

Is there any similar feature in elasticSearch ?
Basically i have another service running somewhere and it is
interested in documents added to elasticSearch.

Thanks
Vineeth

--
Mahendra

http://twitter.com/mahendra


(Shay Banon) #10

Yea, but that heavily relies on how couchdb works internally, which is very
different than how elasticsearch works.

On Wed, Aug 10, 2011 at 5:04 PM, Mahendra M mahendra.m@gmail.com wrote:

Hi Shay,

CouchDB has distributed node support via BigCouch. They also have a
_changes interface which can be used for getting notifications on an
update to any node.

Regards,
Mahendra

On Wed, Aug 10, 2011 at 3:59 PM, Shay Banon kimchy@gmail.com wrote:

This feature is tricky to implement because elasticsearch is built quite
differently than couchdb, also, couchdb has it easy since its a single
node.

On Wed, Aug 10, 2011 at 1:28 PM, Vineeth Mohan <
vineethmohan@algotree.com>
wrote:

Yes , i mean the _change API.
(http://guide.couchdb.org/draft/notifications.html - Continious
notification)

I am sure this feature will help many ppl in their integration process.

Thanks
Vineeth

On Wed, Aug 10, 2011 at 3:51 PM, Shay Banon kimchy@gmail.com wrote:

There isn't an option for that. How do you envision the notification,
as

a pull or push service? With couchdb, do you refer to the _changes API?

On Wed, Aug 10, 2011 at 9:16 AM, Vineeth Mohan
VineethMohan@algotree.com wrote:

In couchDB there is a service where any external application
interested in a particular document can hook to coucDB and if the sth
document is available , couchDB notifies the external application.

Is there any similar feature in elasticSearch ?
Basically i have another service running somewhere and it is
interested in documents added to elasticSearch.

Thanks
Vineeth

--
Mahendra

http://twitter.com/mahendra


(vineeth mohan) #11

Anyway when a document is added to a node , it notifies that to all other
nodes.
Cant we add a plugin point at the place where documents are accepted via
HTTP or when replications comes.
I am sure lot many people can use this plugin point to make custom plugins.

Thanks
Vineeth

On Wed, Aug 10, 2011 at 7:46 PM, Shay Banon kimchy@gmail.com wrote:

Yea, but that heavily relies on how couchdb works internally, which is very
different than how elasticsearch works.

On Wed, Aug 10, 2011 at 5:04 PM, Mahendra M mahendra.m@gmail.com wrote:

Hi Shay,

CouchDB has distributed node support via BigCouch. They also have a
_changes interface which can be used for getting notifications on an
update to any node.

Regards,
Mahendra

On Wed, Aug 10, 2011 at 3:59 PM, Shay Banon kimchy@gmail.com wrote:

This feature is tricky to implement because elasticsearch is built quite
differently than couchdb, also, couchdb has it easy since its a single
node.

On Wed, Aug 10, 2011 at 1:28 PM, Vineeth Mohan <
vineethmohan@algotree.com>
wrote:

Yes , i mean the _change API.
(http://guide.couchdb.org/draft/notifications.html - Continious
notification)

I am sure this feature will help many ppl in their integration process.

Thanks
Vineeth

On Wed, Aug 10, 2011 at 3:51 PM, Shay Banon kimchy@gmail.com wrote:

There isn't an option for that. How do you envision the notification,
as

a pull or push service? With couchdb, do you refer to the _changes
API?

On Wed, Aug 10, 2011 at 9:16 AM, Vineeth Mohan
VineethMohan@algotree.com wrote:

In couchDB there is a service where any external application
interested in a particular document can hook to coucDB and if the sth
document is available , couchDB notifies the external application.

Is there any similar feature in elasticSearch ?
Basically i have another service running somewhere and it is
interested in documents added to elasticSearch.

Thanks
Vineeth

--
Mahendra

http://twitter.com/mahendra


(Shay Banon) #12

The problem here is not hooking into the point where a document is added.
Its more of how those notifications will be sent.

Is it similar to the _changes feed where you continuously get changes, and
in this case, what happens if you want to get previous docs as well, or when
the client being notified was disconnected and would like to get all the
changes that happened while it was disconnected.

Or, is it a push feed, and in this case, how to push those changes
(transport type). What happens when the client is not there to receive the
changes, or its just slow.

The feature itself is much more heavyweight than what I think.

On Wed, Aug 10, 2011 at 5:23 PM, Vineeth Mohan vineethmohan@algotree.comwrote:

Anyway when a document is added to a node , it notifies that to all other
nodes.
Cant we add a plugin point at the place where documents are accepted via
HTTP or when replications comes.
I am sure lot many people can use this plugin point to make custom plugins.

Thanks
Vineeth

On Wed, Aug 10, 2011 at 7:46 PM, Shay Banon kimchy@gmail.com wrote:

Yea, but that heavily relies on how couchdb works internally, which is
very different than how elasticsearch works.

On Wed, Aug 10, 2011 at 5:04 PM, Mahendra M mahendra.m@gmail.com wrote:

Hi Shay,

CouchDB has distributed node support via BigCouch. They also have a
_changes interface which can be used for getting notifications on an
update to any node.

Regards,
Mahendra

On Wed, Aug 10, 2011 at 3:59 PM, Shay Banon kimchy@gmail.com wrote:

This feature is tricky to implement because elasticsearch is built
quite
differently than couchdb, also, couchdb has it easy since its a single
node.

On Wed, Aug 10, 2011 at 1:28 PM, Vineeth Mohan <
vineethmohan@algotree.com>
wrote:

Yes , i mean the _change API.
(http://guide.couchdb.org/draft/notifications.html - Continious
notification)

I am sure this feature will help many ppl in their integration
process.

Thanks
Vineeth

On Wed, Aug 10, 2011 at 3:51 PM, Shay Banon kimchy@gmail.com wrote:

There isn't an option for that. How do you envision the notification,
as

a pull or push service? With couchdb, do you refer to the _changes
API?

On Wed, Aug 10, 2011 at 9:16 AM, Vineeth Mohan
VineethMohan@algotree.com wrote:

In couchDB there is a service where any external application
interested in a particular document can hook to coucDB and if the
sth

document is available , couchDB notifies the external application.

Is there any similar feature in elasticSearch ?
Basically i have another service running somewhere and it is
interested in documents added to elasticSearch.

Thanks
Vineeth

--
Mahendra

http://twitter.com/mahendra


(vineeth mohan) #13

Is there a global revision number like we have in SVN in elasticSearch.
If there is , the problem Shay is speaking can be resolved.
Like when there is a disconnection if the revision number was 210 , again
when it is connected , it can demand all events from revision number 210 to
current.
I believe the global revision number can be implemented in the plugin
itself.

I wonder how this is handled in couchDB !!!
Mahendra , can you shed some insigh into this. Say in couchDB , while doing
contious notifications , it so happened that the the connection hook got
lost.
How will the external application catch up with the changes when the network
hook is established again ...

Thanks
Vineeth

On Wed, Aug 10, 2011 at 8:24 PM, Shay Banon kimchy@gmail.com wrote:

The problem here is not hooking into the point where a document is added.
Its more of how those notifications will be sent.

Is it similar to the _changes feed where you continuously get changes, and
in this case, what happens if you want to get previous docs as well, or when
the client being notified was disconnected and would like to get all the
changes that happened while it was disconnected.

Or, is it a push feed, and in this case, how to push those changes
(transport type). What happens when the client is not there to receive the
changes, or its just slow.

The feature itself is much more heavyweight than what I think.

On Wed, Aug 10, 2011 at 5:23 PM, Vineeth Mohan vineethmohan@algotree.comwrote:

Anyway when a document is added to a node , it notifies that to all other
nodes.
Cant we add a plugin point at the place where documents are accepted via
HTTP or when replications comes.
I am sure lot many people can use this plugin point to make custom
plugins.

Thanks
Vineeth

On Wed, Aug 10, 2011 at 7:46 PM, Shay Banon kimchy@gmail.com wrote:

Yea, but that heavily relies on how couchdb works internally, which is
very different than how elasticsearch works.

On Wed, Aug 10, 2011 at 5:04 PM, Mahendra M mahendra.m@gmail.comwrote:

Hi Shay,

CouchDB has distributed node support via BigCouch. They also have a
_changes interface which can be used for getting notifications on an
update to any node.

Regards,
Mahendra

On Wed, Aug 10, 2011 at 3:59 PM, Shay Banon kimchy@gmail.com wrote:

This feature is tricky to implement because elasticsearch is built
quite
differently than couchdb, also, couchdb has it easy since its a single
node.

On Wed, Aug 10, 2011 at 1:28 PM, Vineeth Mohan <
vineethmohan@algotree.com>
wrote:

Yes , i mean the _change API.
(http://guide.couchdb.org/draft/notifications.html - Continious
notification)

I am sure this feature will help many ppl in their integration
process.

Thanks
Vineeth

On Wed, Aug 10, 2011 at 3:51 PM, Shay Banon kimchy@gmail.com
wrote:

There isn't an option for that. How do you envision the
notification, as

a pull or push service? With couchdb, do you refer to the _changes
API?

On Wed, Aug 10, 2011 at 9:16 AM, Vineeth Mohan
VineethMohan@algotree.com wrote:

In couchDB there is a service where any external application
interested in a particular document can hook to coucDB and if the
sth

document is available , couchDB notifies the external application.

Is there any similar feature in elasticSearch ?
Basically i have another service running somewhere and it is
interested in documents added to elasticSearch.

Thanks
Vineeth

--
Mahendra

http://twitter.com/mahendra


(vineeth mohan) #14

couchDB have something similar cocept to the revision number in source
control likes SVN.
It maintains a revision number for each database.
Hence we can do a curl to couchDB like

curl -X GET $HOST/db/_changes?since=REVISION_NUMBER

It will only fetch the results from the last REVISION_NUMBER.

Let me know you thoughts , if you feel this idea is workable , it will help
lots of folks out there.

Thanks
Vineeth

On Wed, Aug 10, 2011 at 8:50 PM, Vineeth Mohan vineethmohan@algotree.comwrote:

Is there a global revision number like we have in SVN in elasticSearch.
If there is , the problem Shay is speaking can be resolved.
Like when there is a disconnection if the revision number was 210 , again
when it is connected , it can demand all events from revision number 210 to
current.
I believe the global revision number can be implemented in the plugin
itself.

I wonder how this is handled in couchDB !!!
Mahendra , can you shed some insigh into this. Say in couchDB , while doing
contious notifications , it so happened that the the connection hook got
lost.
How will the external application catch up with the changes when the
network hook is established again ...

Thanks
Vineeth

On Wed, Aug 10, 2011 at 8:24 PM, Shay Banon kimchy@gmail.com wrote:

The problem here is not hooking into the point where a document is added.
Its more of how those notifications will be sent.

Is it similar to the _changes feed where you continuously get changes, and
in this case, what happens if you want to get previous docs as well, or when
the client being notified was disconnected and would like to get all the
changes that happened while it was disconnected.

Or, is it a push feed, and in this case, how to push those changes
(transport type). What happens when the client is not there to receive the
changes, or its just slow.

The feature itself is much more heavyweight than what I think.

On Wed, Aug 10, 2011 at 5:23 PM, Vineeth Mohan <vineethmohan@algotree.com

wrote:

Anyway when a document is added to a node , it notifies that to all other
nodes.
Cant we add a plugin point at the place where documents are accepted via
HTTP or when replications comes.
I am sure lot many people can use this plugin point to make custom
plugins.

Thanks
Vineeth

On Wed, Aug 10, 2011 at 7:46 PM, Shay Banon kimchy@gmail.com wrote:

Yea, but that heavily relies on how couchdb works internally, which is
very different than how elasticsearch works.

On Wed, Aug 10, 2011 at 5:04 PM, Mahendra M mahendra.m@gmail.comwrote:

Hi Shay,

CouchDB has distributed node support via BigCouch. They also have a
_changes interface which can be used for getting notifications on an
update to any node.

Regards,
Mahendra

On Wed, Aug 10, 2011 at 3:59 PM, Shay Banon kimchy@gmail.com wrote:

This feature is tricky to implement because elasticsearch is built
quite
differently than couchdb, also, couchdb has it easy since its a
single node.

On Wed, Aug 10, 2011 at 1:28 PM, Vineeth Mohan <
vineethmohan@algotree.com>
wrote:

Yes , i mean the _change API.
(http://guide.couchdb.org/draft/notifications.html - Continious
notification)

I am sure this feature will help many ppl in their integration
process.

Thanks
Vineeth

On Wed, Aug 10, 2011 at 3:51 PM, Shay Banon kimchy@gmail.com
wrote:

There isn't an option for that. How do you envision the
notification, as

a pull or push service? With couchdb, do you refer to the _changes
API?

On Wed, Aug 10, 2011 at 9:16 AM, Vineeth Mohan
VineethMohan@algotree.com wrote:

In couchDB there is a service where any external application
interested in a particular document can hook to coucDB and if the
sth

document is available , couchDB notifies the external application.

Is there any similar feature in elasticSearch ?
Basically i have another service running somewhere and it is
interested in documents added to elasticSearch.

Thanks
Vineeth

--
Mahendra

http://twitter.com/mahendra


(Otis Gospodnetić) #15

Btw. does ES have something like Solr's UpdateRequestProcessor? I
think the answer is negative, but if it had/has that, would that be a
good place to plug in custom "new document listeners"?

Otis

Sematext is hiring Search Engineers -- http://sematext.com/about/jobs.html

On Aug 10, 11:45 am, Vineeth Mohan vineethmo...@algotree.com wrote:

couchDB have something similar cocept to the revision number in source
control likes SVN.
It maintains a revision number for each database.
Hence we can do a curl to couchDB like

curl -X GET $HOST/db/_changes?since=REVISION_NUMBER

It will only fetch the results from the last REVISION_NUMBER.

Let me know you thoughts , if you feel this idea is workable , it will help
lots of folks out there.

Thanks
Vineeth

On Wed, Aug 10, 2011 at 8:50 PM, Vineeth Mohan vineethmo...@algotree.comwrote:

Is there a global revision number like we have in SVN in elasticSearch.
If there is , the problem Shay is speaking can be resolved.
Like when there is a disconnection if the revision number was 210 , again
when it is connected , it can demand all events from revision number 210 to
current.
I believe the global revision number can be implemented in the plugin
itself.

I wonder how this is handled in couchDB !!!
Mahendra , can you shed some insigh into this. Say in couchDB , while doing
contious notifications , it so happened that the the connection hook got
lost.
How will the external application catch up with the changes when the
network hook is established again ...

Thanks
Vineeth

On Wed, Aug 10, 2011 at 8:24 PM, Shay Banon kim...@gmail.com wrote:

The problem here is not hooking into the point where a document is added.
Its more of how those notifications will be sent.

Is it similar to the _changes feed where you continuously get changes, and
in this case, what happens if you want to get previous docs as well, or when
the client being notified was disconnected and would like to get all the
changes that happened while it was disconnected.

Or, is it a push feed, and in this case, how to push those changes
(transport type). What happens when the client is not there to receive the
changes, or its just slow.

The feature itself is much more heavyweight than what I think.

On Wed, Aug 10, 2011 at 5:23 PM, Vineeth Mohan <vineethmo...@algotree.com

wrote:

Anyway when a document is added to a node , it notifies that to all other
nodes.
Cant we add a plugin point at the place where documents are accepted via
HTTP or when replications comes.
I am sure lot many people can use this plugin point to make custom
plugins.

Thanks
Vineeth

On Wed, Aug 10, 2011 at 7:46 PM, Shay Banon kim...@gmail.com wrote:

Yea, but that heavily relies on how couchdb works internally, which is
very different than how elasticsearch works.

On Wed, Aug 10, 2011 at 5:04 PM, Mahendra M mahendr...@gmail.comwrote:

Hi Shay,

CouchDB has distributed node support via BigCouch. They also have a
_changes interface which can be used for getting notifications on an
update to any node.

Regards,
Mahendra

On Wed, Aug 10, 2011 at 3:59 PM, Shay Banon kim...@gmail.com wrote:

This feature is tricky to implement because elasticsearch is built
quite
differently than couchdb, also, couchdb has it easy since its a
single node.

On Wed, Aug 10, 2011 at 1:28 PM, Vineeth Mohan <
vineethmo...@algotree.com>
wrote:

Yes , i mean the _change API.
(http://guide.couchdb.org/draft/notifications.html- Continious
notification)

I am sure this feature will help many ppl in their integration
process.

Thanks
Vineeth

On Wed, Aug 10, 2011 at 3:51 PM, Shay Banon kim...@gmail.com
wrote:

There isn't an option for that. How do you envision the
notification, as

a pull or push service? With couchdb, do you refer to the _changes
API?

On Wed, Aug 10, 2011 at 9:16 AM, Vineeth Mohan
VineethMo...@algotree.com wrote:

In couchDB there is a service where any external application
interested in a particular document can hook to coucDB and if the
sth

document is available , couchDB notifies the external application.

Is there any similar feature in elasticSearch ?
Basically i have another service running somewhere and it is
interested in documents added to elasticSearch.

Thanks
Vineeth

--
Mahendra

http://twitter.com/mahendra


(Karel Minarik) #16

In my opinion, you can use percolation in a similar manner to filtered
CouchDB _changes feed: you'll receive matching queries for every
document being indexed.

On Aug 10, 12:32 pm, Vineeth Mohan vineethmo...@algotree.com wrote:

If someone can give tips and clue on how o implement this , i am willing to
make a open source plugin out of this.

Thanks
Vineeth

On Wed, Aug 10, 2011 at 3:59 PM, Shay Banon kim...@gmail.com wrote:

This feature is tricky to implement because elasticsearch is built quite
differently than couchdb, also, couchdb has it easy since its a single node.

On Wed, Aug 10, 2011 at 1:28 PM, Vineeth Mohan vineethmo...@algotree.comwrote:

Yes , i mean the _change API.
(http://guide.couchdb.org/draft/notifications.html- Continious
notification)

I am sure this feature will help many ppl in their integration process.

Thanks
Vineeth

On Wed, Aug 10, 2011 at 3:51 PM, Shay Banon kim...@gmail.com wrote:

There isn't an option for that. How do you envision the notification, as
a pull or push service? With couchdb, do you refer to the _changes API?

On Wed, Aug 10, 2011 at 9:16 AM, Vineeth Mohan <
VineethMo...@algotree.com> wrote:

In couchDB there is a service where any external application
interested in a particular document can hook to coucDB and if the sth
document is available , couchDB notifies the external application.

Is there any similar feature in elasticSearch ?
Basically i have another service running somewhere and it is
interested in documents added to elasticSearch.

Thanks
Vineeth


(vineeth mohan) #17

Hello Karmi ,

Can you explain how to use percolation to achieve the feature i want ???.

My whole idea is to decouple the module that indexes the document and the
external applications that are interested in new documents.

AFAIK peroclation provides a method to reverse map queries.
That is a query can be registered and when a new document comes , percolator
tells which all queries were registed.

How can this feature be used by an external application to determine if a
new document is indexed.

The feature i am looking is well implemented in couchDB. If you have any
doubt please go through - http://guide.couchdb.org/draft/notifications.html
They have something like the global revision number as seen in source code
repo's like SVN. On each poll , the external application can tell ES to give
all document that were added/edited since thi revision number.

Thanks
Vineeth

On Wed, Aug 10, 2011 at 11:45 PM, karmi karel.minarik@gmail.com wrote:

In my opinion, you can use percolation in a similar manner to filtered
CouchDB _changes feed: you'll receive matching queries for every
document being indexed.

On Aug 10, 12:32 pm, Vineeth Mohan vineethmo...@algotree.com wrote:

If someone can give tips and clue on how o implement this , i am willing
to
make a open source plugin out of this.

Thanks
Vineeth

On Wed, Aug 10, 2011 at 3:59 PM, Shay Banon kim...@gmail.com wrote:

This feature is tricky to implement because elasticsearch is built
quite

differently than couchdb, also, couchdb has it easy since its a single
node.

On Wed, Aug 10, 2011 at 1:28 PM, Vineeth Mohan <
vineethmo...@algotree.com>wrote:

Yes , i mean the _change API.
(http://guide.couchdb.org/draft/notifications.html- Continious
notification)

I am sure this feature will help many ppl in their integration
process.

Thanks
Vineeth

On Wed, Aug 10, 2011 at 3:51 PM, Shay Banon kim...@gmail.com wrote:

There isn't an option for that. How do you envision the notification,
as

a pull or push service? With couchdb, do you refer to the _changes
API?

On Wed, Aug 10, 2011 at 9:16 AM, Vineeth Mohan <
VineethMo...@algotree.com> wrote:

In couchDB there is a service where any external application
interested in a particular document can hook to coucDB and if the
sth

document is available , couchDB notifies the external application.

Is there any similar feature in elasticSearch ?
Basically i have another service running somewhere and it is
interested in documents added to elasticSearch.

Thanks
Vineeth


(David Richardson) #18

How then would one push change events into rabbitmq, or some other message
broker. Not my preferred mechanism, since rabbitmq isn't distributed, but
perhaps that isn't so important for change events. Soft realtime required,
polling not allowed. Doing this "in the (external) app" isn't viable.

Change notifications and WAN replication are really the only things missing
in es that preclude decommissioning our couchdb infrastructure - which we
would like to do since virtually every query against must already go through
external search. Getting change events into an external message broker
provides an immediate solution to both, but perhaps that's no easier than an
internal changes feed.

btw, Postgresql provides an even better model for external notificationshttp://www.postgresql.org/docs/9.0/interactive/sql-notify.htmlimho - multiple channels plus a programmable payload. Have no experience
with it at extreme load, but under moderate load it works wonderfully.
Again, radically different technical environment - it's the api model that's
of interest. What we're talking about for ES is a river producer, rather
than consumer.

cheers,
d.r.


(Shay Banon) #19

Let me do a quick brain dump here, and try to explain what needs to be done
to properly support this:

First, one can (with my help, or looking at the code) write a plugin that
registers for indexing operations. The listener can also check and make sure
to only process events that happen on a primary shard (so they won't be
processed on the replica, if a design requires it). But, to be honest, this
is the easy part.

For changes feed, one has two options, pull and push.

Lets start with push. Push notifications are quire simple to implement in a
non distributed solution (like redis does). People register listeners and
every time an operation happens, the listeners get notified. It does require
some thought as to how to publish those events. If one controls the clients
as well, then its simple (i.e. doing it only for the Java API), but, since
elasticsearch treats HTTP as a first class citizen, a solution for HTTP
needs to be built as well. This can be similar to pusspubsub...(
http://code.google.com/p/pubsubhubbub/), but then the clients needs also to
listen for HTTP requests.

Also, with push notifications, there is a question of do we only send "new"
events to the listener, or also send all the current data (possibly
filtered) and later, new events happening. There is a question of what to do
with misbehaving endpoints that don't process notifications fast enough
(tricky to identify it...), block them, drop them, or something similar.

Also, there is a question if the listeners are persistent. If they go away,
do we queue events and send it to them once they reconnect?

Now, lets move to a distributed solution. Lets start with simple HA,
replication. Now, we need to make sure that listeners registrations are
persisted across the cluster (and possibly surviving full cluster restart).
Also, we need to make sure as shards move around that those listeners move
around with them. Also, if we support persistent notifications, we need the
queue of future events that need to be sent to disconnected clients is
replicated as well (and we need to recover them, get this data into hot
relocation of shards, and so on).

Now, lets talk about pull notifications, which is similar to how couchdb
does things. First, a note on couchdb. The data structure couchdb has
(basically, a never ending (up to compaction) btree) is a big boon when it
comes to implementing pull notifications. elasticsearch/lucene do not work
like that.

Pull notification will probably require API based invocation of give me
changes since X. X can be a timestamp, or an id that denotes some sort of
"timeability"/order. A user will need to register the fact that it starts
listener, and we in elasticsearch can make sure that any changes are kept
around for the next pull request the user does (either on an open HTTP
connection, or per request, does not really matter). This is a bit simpler
to implement in elasticsearch, we can keep the transaction log around long
enough till we notified all clients about the changes, and, it allows us to
do async notifications more easily. But, it still requires delicate control
over the transaction log and when we can safely "get rid" of it.

Also, pull notification require thought as to how to provide all the
"current" data in elasticsearch, Again, its certainly possible, and the user
can provide a query that will filter that data if not all data is needed.

In terms of the internals of how elasticsearch works, pull notification is
simpler, but still require delicate work when it comes to concurrency,
transaction log handling, that are pretty low level... . Not simple.

Summary:

One of the things left on the plate for elasticsearch is cross data center
replication. I would love to implement it in a way that cross data center
replication mechanism is open enough for users to use. What does it mean?
For example, if we do pull based notifications, we can possibly utilize that
for cross data center replication. Another cluster, halfway around the
world, is just another user of the pull based notifications.

Hope things make a bit more sense now... :slight_smile:

On Sat, Aug 13, 2011 at 8:30 PM, David Richardson <
david.richardson@enquora.com> wrote:

How then would one push change events into rabbitmq, or some other message
broker. Not my preferred mechanism, since rabbitmq isn't distributed, but
perhaps that isn't so important for change events. Soft realtime required,
polling not allowed. Doing this "in the (external) app" isn't viable.

Change notifications and WAN replication are really the only things missing
in es that preclude decommissioning our couchdb infrastructure - which we
would like to do since virtually every query against must already go through
external search. Getting change events into an external message broker
provides an immediate solution to both, but perhaps that's no easier than an
internal changes feed.

btw, Postgresql provides an even better model for external notificationshttp://www.postgresql.org/docs/9.0/interactive/sql-notify.htmlimho - multiple channels plus a programmable payload. Have no experience
with it at extreme load, but under moderate load it works wonderfully.
Again, radically different technical environment - it's the api model that's
of interest. What we're talking about for ES is a river producer, rather
than consumer.

cheers,
d.r.


(vineeth mohan) #20

Thanks for taking up time to provide all your thoughts.
I am happy that you are thinking the same way we do , on the problem.
Once we can make this feature , on writing a ES river we can replicate ES
over to another cluster.

Also i have made a feature request on the same -

Thanks
Vineeth

On Sun, Aug 14, 2011 at 1:05 AM, Shay Banon kimchy@gmail.com wrote:

Let me do a quick brain dump here, and try to explain what needs to be done
to properly support this:

First, one can (with my help, or looking at the code) write a plugin that
registers for indexing operations. The listener can also check and make sure
to only process events that happen on a primary shard (so they won't be
processed on the replica, if a design requires it). But, to be honest, this
is the easy part.

For changes feed, one has two options, pull and push.

Lets start with push. Push notifications are quire simple to implement in a
non distributed solution (like redis does). People register listeners and
every time an operation happens, the listeners get notified. It does require
some thought as to how to publish those events. If one controls the clients
as well, then its simple (i.e. doing it only for the Java API), but, since
elasticsearch treats HTTP as a first class citizen, a solution for HTTP
needs to be built as well. This can be similar to pusspubsub...(
http://code.google.com/p/pubsubhubbub/), but then the clients needs also
to listen for HTTP requests.

Also, with push notifications, there is a question of do we only send "new"
events to the listener, or also send all the current data (possibly
filtered) and later, new events happening. There is a question of what to do
with misbehaving endpoints that don't process notifications fast enough
(tricky to identify it...), block them, drop them, or something similar.

Also, there is a question if the listeners are persistent. If they go away,
do we queue events and send it to them once they reconnect?

Now, lets move to a distributed solution. Lets start with simple HA,
replication. Now, we need to make sure that listeners registrations are
persisted across the cluster (and possibly surviving full cluster restart).
Also, we need to make sure as shards move around that those listeners move
around with them. Also, if we support persistent notifications, we need the
queue of future events that need to be sent to disconnected clients is
replicated as well (and we need to recover them, get this data into hot
relocation of shards, and so on).

Now, lets talk about pull notifications, which is similar to how couchdb
does things. First, a note on couchdb. The data structure couchdb has
(basically, a never ending (up to compaction) btree) is a big boon when it
comes to implementing pull notifications. elasticsearch/lucene do not work
like that.

Pull notification will probably require API based invocation of give me
changes since X. X can be a timestamp, or an id that denotes some sort of
"timeability"/order. A user will need to register the fact that it starts
listener, and we in elasticsearch can make sure that any changes are kept
around for the next pull request the user does (either on an open HTTP
connection, or per request, does not really matter). This is a bit simpler
to implement in elasticsearch, we can keep the transaction log around long
enough till we notified all clients about the changes, and, it allows us to
do async notifications more easily. But, it still requires delicate control
over the transaction log and when we can safely "get rid" of it.

Also, pull notification require thought as to how to provide all the
"current" data in elasticsearch, Again, its certainly possible, and the user
can provide a query that will filter that data if not all data is needed.

In terms of the internals of how elasticsearch works, pull notification is
simpler, but still require delicate work when it comes to concurrency,
transaction log handling, that are pretty low level... . Not simple.

Summary:

One of the things left on the plate for elasticsearch is cross data center
replication. I would love to implement it in a way that cross data center
replication mechanism is open enough for users to use. What does it mean?
For example, if we do pull based notifications, we can possibly utilize that
for cross data center replication. Another cluster, halfway around the
world, is just another user of the pull based notifications.

Hope things make a bit more sense now... :slight_smile:

On Sat, Aug 13, 2011 at 8:30 PM, David Richardson <
david.richardson@enquora.com> wrote:

How then would one push change events into rabbitmq, or some other message
broker. Not my preferred mechanism, since rabbitmq isn't distributed, but
perhaps that isn't so important for change events. Soft realtime required,
polling not allowed. Doing this "in the (external) app" isn't viable.

Change notifications and WAN replication are really the only things
missing in es that preclude decommissioning our couchdb infrastructure -
which we would like to do since virtually every query against must already
go through external search. Getting change events into an external message
broker provides an immediate solution to both, but perhaps that's no easier
than an internal changes feed.

btw, Postgresql provides an even better model for external notificationshttp://www.postgresql.org/docs/9.0/interactive/sql-notify.htmlimho - multiple channels plus a programmable payload. Have no experience
with it at extreme load, but under moderate load it works wonderfully.
Again, radically different technical environment - it's the api model that's
of interest. What we're talking about for ES is a river producer, rather
than consumer.

cheers,
d.r.