Notifications from Elasticsearch when documents are added

Hi Shay, would you mind giving some pointers on how exactly one should
create a plugin to register for indexing operations? Where can I find the
hooks?

Regards

On Saturday, August 13, 2011 3:35:24 PM UTC-4, kimchy wrote:

Let me do a quick brain dump here, and try to explain what needs to be
done to properly support this:

First, one can (with my help, or looking at the code) write a plugin that
registers for indexing operations. The listener can also check and make
sure to only process events that happen on a primary shard (so they won't
be processed on the replica, if a design requires it). But, to be honest,
this is the easy part.

For changes feed, one has two options, pull and push.

Lets start with push. Push notifications are quire simple to implement in
a non distributed solution (like redis does). People register listeners and
every time an operation happens, the listeners get notified. It does
require some thought as to how to publish those events. If one controls the
clients as well, then its simple (i.e. doing it only for the Java API),
but, since elasticsearch treats HTTP as a first class citizen, a solution
for HTTP needs to be built as well. This can be similar to pusspubsub...(
PubSubHubbub · GitHub), but then the clients needs also
to listen for HTTP requests.

Also, with push notifications, there is a question of do we only send
"new" events to the listener, or also send all the current data (possibly
filtered) and later, new events happening. There is a question of what to
do with misbehaving endpoints that don't process notifications fast enough
(tricky to identify it...), block them, drop them, or something similar.

Also, there is a question if the listeners are persistent. If they go
away, do we queue events and send it to them once they reconnect?

Now, lets move to a distributed solution. Lets start with simple HA,
replication. Now, we need to make sure that listeners registrations are
persisted across the cluster (and possibly surviving full cluster restart).
Also, we need to make sure as shards move around that those listeners move
around with them. Also, if we support persistent notifications, we need the
queue of future events that need to be sent to disconnected clients is
replicated as well (and we need to recover them, get this data into hot
relocation of shards, and so on).

Now, lets talk about pull notifications, which is similar to how couchdb
does things. First, a note on couchdb. The data structure couchdb has
(basically, a never ending (up to compaction) btree) is a big boon when it
comes to implementing pull notifications. elasticsearch/lucene do not work
like that.

Pull notification will probably require API based invocation of give me
changes since X. X can be a timestamp, or an id that denotes some sort of
"timeability"/order. A user will need to register the fact that it starts
listener, and we in elasticsearch can make sure that any changes are kept
around for the next pull request the user does (either on an open HTTP
connection, or per request, does not really matter). This is a bit simpler
to implement in elasticsearch, we can keep the transaction log around long
enough till we notified all clients about the changes, and, it allows us to
do async notifications more easily. But, it still requires delicate control
over the transaction log and when we can safely "get rid" of it.

Also, pull notification require thought as to how to provide all the
"current" data in elasticsearch, Again, its certainly possible, and the
user can provide a query that will filter that data if not all data is
needed.

In terms of the internals of how elasticsearch works, pull notification is
simpler, but still require delicate work when it comes to concurrency,
transaction log handling, that are pretty low level... . Not simple.

Summary:

One of the things left on the plate for elasticsearch is cross data center
replication. I would love to implement it in a way that cross data center
replication mechanism is open enough for users to use. What does it mean?
For example, if we do pull based notifications, we can possibly utilize
that for cross data center replication. Another cluster, halfway around the
world, is just another user of the pull based notifications.

Hope things make a bit more sense now... :slight_smile:

On Sat, Aug 13, 2011 at 8:30 PM, David Richardson <david.ri...@enquora.com<javascript:>

wrote:

How then would one push change events into rabbitmq, or some other
message broker. Not my preferred mechanism, since rabbitmq isn't
distributed, but perhaps that isn't so important for change events. Soft
realtime required, polling not allowed. Doing this "in the (external) app"
isn't viable.

Change notifications and WAN replication are really the only things
missing in es that preclude decommissioning our couchdb infrastructure -
which we would like to do since virtually every query against must already
go through external search. Getting change events into an external message
broker provides an immediate solution to both, but perhaps that's no easier
than an internal changes feed.

btw, Postgresql provides an even better model for external notificationshttp://www.postgresql.org/docs/9.0/interactive/sql-notify.htmlimho - multiple channels plus a programmable payload. Have no experience
with it at extreme load, but under moderate load it works wonderfully.
Again, radically different technical environment - it's the api model
that's of interest. What we're talking about for ES is a river producer,
rather than consumer.

cheers,
d.r.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Look at org.elasticsearch.plugins.ShardsPluginsModule - if you implement a
plugin and register it to the plugins service by using the shardModules()
method, your modules have access to the shard level of Elasticsearch, for
example, indexing operations.

Jörg

On Mon, Jul 22, 2013 at 9:44 PM, Vinicius Carvalho <
viniciusccarvalho@gmail.com> wrote:

Hi Shay, would you mind giving some pointers on how exactly one should
create a plugin to register for indexing operations? Where can I find the
hooks?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Thanks for the reply, but I guess I should re-phrase my question. From what
you shown, seems that at this point, my module will have access to all
modules registered at the shard level right? What I was actually interested
is somehow to hook a listener (much like you can for Lifecycle listener.
And be notified when I have index operations (index, update, delete), I
guess I could just use the approach taken by partial update plugin and
create an action that is executed before the index action. Just trying to
find out the proper way of doing.

BTW: I'm trying to listen for notifications on the index, so I can
propagate it to a separate channel, on a push style replication.

Regards

On Monday, July 22, 2013 4:18:09 PM UTC-4, Jörg Prante wrote:

Look at org.elasticsearch.plugins.ShardsPluginsModule - if you implement a
plugin and register it to the plugins service by using the shardModules()
method, your modules have access to the shard level of Elasticsearch, for
example, indexing operations.

Jörg

On Mon, Jul 22, 2013 at 9:44 PM, Vinicius Carvalho <vinicius...@gmail.com<javascript:>

wrote:

Hi Shay, would you mind giving some pointers on how exactly one should
create a plugin to register for indexing operations? Where can I find the
hooks?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hey,

there is an IndexingOperationListener class which might help you (it is
used for percolation at the moment)

--Alex

On Tue, Jul 23, 2013 at 3:44 AM, Vinicius Carvalho <
viniciusccarvalho@gmail.com> wrote:

Thanks for the reply, but I guess I should re-phrase my question. From
what you shown, seems that at this point, my module will have access to all
modules registered at the shard level right? What I was actually interested
is somehow to hook a listener (much like you can for Lifecycle listener.
And be notified when I have index operations (index, update, delete), I
guess I could just use the approach taken by partial update plugin and
create an action that is executed before the index action. Just trying to
find out the proper way of doing.

BTW: I'm trying to listen for notifications on the index, so I can
propagate it to a separate channel, on a push style replication.

Regards

On Monday, July 22, 2013 4:18:09 PM UTC-4, Jörg Prante wrote:

Look at org.elasticsearch.plugins.**ShardsPluginsModule - if you
implement a plugin and register it to the plugins service by using the
shardModules() method, your modules have access to the shard level of
Elasticsearch, for example, indexing operations.

Jörg

On Mon, Jul 22, 2013 at 9:44 PM, Vinicius Carvalho <vinicius...@gmail.com

wrote:

Hi Shay, would you mind giving some pointers on how exactly one should
create a plugin to register for indexing operations? Where can I find the
hooks?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

ES executes indexing operations in a distributed manner. There is no
listener. Look at org.elasticsearch.action.index.TransportIndexAction. This
is where a Java API index operation is implemented at shard level. There is
a method, postPrimaryOperation(). At the moment, the percolator is
implemented there. To implement a push operation, this is the right place.
In a plugin, you can either inherit your custom index action from
IndexAction, or copy/paste the code. Another option is sending a pull
request to the core code to extend the existing index action.

My favorite approach is adding a Guava event bus (with bounded queue) on
node level for pushing all kinds of events across the cluster, with the
response addressed to the requesting client who has established a
persistent connection (SPDY or websocket) by using a pubsub mechanism. When
using the node level, the push operation is not limited to index
operations, only to the node resources. Registering for events would be
realized by Guava event bus and pubsub subscribe actions.

For my work having done so far, see

Jörg

On Tue, Jul 23, 2013 at 3:44 AM, Vinicius Carvalho <
viniciusccarvalho@gmail.com> wrote:

Thanks for the reply, but I guess I should re-phrase my question. From
what you shown, seems that at this point, my module will have access to all
modules registered at the shard level right? What I was actually interested
is somehow to hook a listener (much like you can for Lifecycle listener.
And be notified when I have index operations (index, update, delete), I
guess I could just use the approach taken by partial update plugin and
create an action that is executed before the index action. Just trying to
find out the proper way of doing.

BTW: I'm trying to listen for notifications on the index, so I can
propagate it to a separate channel, on a push style replication.

Regards

On Monday, July 22, 2013 4:18:09 PM UTC-4, Jörg Prante wrote:

Look at org.elasticsearch.plugins.**ShardsPluginsModule - if you
implement a plugin and register it to the plugins service by using the
shardModules() method, your modules have access to the shard level of
Elasticsearch, for example, indexing operations.

Jörg

On Mon, Jul 22, 2013 at 9:44 PM, Vinicius Carvalho <vinicius...@gmail.com

wrote:

Hi Shay, would you mind giving some pointers on how exactly one should
create a plugin to register for indexing operations? Where can I find the
hooks?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Thanks again Jorg, so that you know I'm actually considering using kafka
for intra cluster replication. We want to push the index operations to a
topic and then other clusters on different DCs would subscribe to this.
Conflict resolution will be last commit will win. And in case of kafka
cluster failure we will append changes to a local index, and then send them
over as the bus is back. In the case ES cluster dies, and when it recovers,
one nice thing on kafka is that one can request messages based on an
offset, so we could start consuming messages from the last point the
cluster had consume them.

It's all ideas I'm working right now. I'll probably have time to start
coding them soon. Thanks for all the support :slight_smile:

Cheers

On Tuesday, July 23, 2013 2:47:29 AM UTC-4, Jörg Prante wrote:

ES executes indexing operations in a distributed manner. There is no
listener. Look at org.elasticsearch.action.index.TransportIndexAction. This
is where a Java API index operation is implemented at shard level. There is
a method, postPrimaryOperation(). At the moment, the percolator is
implemented there. To implement a push operation, this is the right place.
In a plugin, you can either inherit your custom index action from
IndexAction, or copy/paste the code. Another option is sending a pull
request to the core code to extend the existing index action.

My favorite approach is adding a Guava event bus (with bounded queue) on
node level for pushing all kinds of events across the cluster, with the
response addressed to the requesting client who has established a
persistent connection (SPDY or websocket) by using a pubsub mechanism. When
using the node level, the push operation is not limited to index
operations, only to the node resources. Registering for events would be
realized by Guava event bus and pubsub subscribe actions.

For my work having done so far, see
GitHub - jprante/elasticsearch-transport-websocket: WebSockets for ElasticSearch

Jörg

On Tue, Jul 23, 2013 at 3:44 AM, Vinicius Carvalho <vinicius...@gmail.com<javascript:>

wrote:

Thanks for the reply, but I guess I should re-phrase my question. From
what you shown, seems that at this point, my module will have access to all
modules registered at the shard level right? What I was actually interested
is somehow to hook a listener (much like you can for Lifecycle listener.
And be notified when I have index operations (index, update, delete), I
guess I could just use the approach taken by partial update plugin and
create an action that is executed before the index action. Just trying to
find out the proper way of doing.

BTW: I'm trying to listen for notifications on the index, so I can
propagate it to a separate channel, on a push style replication.

Regards

On Monday, July 22, 2013 4:18:09 PM UTC-4, Jörg Prante wrote:

Look at org.elasticsearch.plugins.**ShardsPluginsModule - if you
implement a plugin and register it to the plugins service by using the
shardModules() method, your modules have access to the shard level of
Elasticsearch, for example, indexing operations.

Jörg

On Mon, Jul 22, 2013 at 9:44 PM, Vinicius Carvalho <
vinicius...@gmail.com> wrote:

Hi Shay, would you mind giving some pointers on how exactly one should
create a plugin to register for indexing operations? Where can I find the
hooks?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Yes, I once examined Kafka, and discovered that many components are already
there in Elasticsearch. For example, the activity stream is already there
as ES translog (if you focus on indexing operations) and the ES gateway is
a useful persistency store mechanism. What I didn't like was the single
Kafka JVM, and the Zookeeper infrastructure, it is all adding up complexity
beside ES.

For cross-cluster replication, I think the best approach is distributed log
replication. This is hard, because logged ES operations must be
synchronized by an external time source (e.g. vector clocks) to use them
like a global event stream. A pubsub mechanism could then work at the
primary shards of an index in the ES node as a service, merging the
translogs for an external agent who previously subscribed to the
replication stream. The vector clock is required for a distributed time
machine like behavior (snapshots), assuming the translog is not deleted,
but stored for a certain time window.

Jörg

On Tue, Jul 23, 2013 at 3:55 PM, Vinicius Carvalho <
viniciusccarvalho@gmail.com> wrote:

Thanks again Jorg, so that you know I'm actually considering using kafka
for intra cluster replication. We want to push the index operations to a
topic and then other clusters on different DCs would subscribe to this.
Conflict resolution will be last commit will win. And in case of kafka
cluster failure we will append changes to a local index, and then send them
over as the bus is back. In the case ES cluster dies, and when it recovers,
one nice thing on kafka is that one can request messages based on an
offset, so we could start consuming messages from the last point the
cluster had consume them.

It's all ideas I'm working right now. I'll probably have time to start
coding them soon. Thanks for all the support :slight_smile:

Cheers

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi!

Have there been any further explorations in the area of wan replication?

I have ES clusters in multiple datacenters connected via high-speed private
network. I'm wondering if multi-master replication would be possible in
this environment or if we'd need some type of 'shovel' plugin like the one
described here to ship data between the DCs.

Thanks,
Matthew

On Tuesday, July 23, 2013 10:06:10 AM UTC-7, Jörg Prante wrote:

Yes, I once examined Kafka, and discovered that many components are
already there in Elasticsearch. For example, the activity stream is already
there as ES translog (if you focus on indexing operations) and the ES
gateway is a useful persistency store mechanism. What I didn't like was the
single Kafka JVM, and the Zookeeper infrastructure, it is all adding up
complexity beside ES.

For cross-cluster replication, I think the best approach is distributed
log replication. This is hard, because logged ES operations must be
synchronized by an external time source (e.g. vector clocks) to use them
like a global event stream. A pubsub mechanism could then work at the
primary shards of an index in the ES node as a service, merging the
translogs for an external agent who previously subscribed to the
replication stream. The vector clock is required for a distributed time
machine like behavior (snapshots), assuming the translog is not deleted,
but stored for a certain time window.

Jörg

On Tue, Jul 23, 2013 at 3:55 PM, Vinicius Carvalho <vinicius...@gmail.com
<javascript:>> wrote:

Thanks again Jorg, so that you know I'm actually considering using kafka
for intra cluster replication. We want to push the index operations to a
topic and then other clusters on different DCs would subscribe to this.
Conflict resolution will be last commit will win. And in case of kafka
cluster failure we will append changes to a local index, and then send them
over as the bus is back. In the case ES cluster dies, and when it recovers,
one nice thing on kafka is that one can request messages based on an
offset, so we could start consuming messages from the last point the
cluster had consume them.

It's all ideas I'm working right now. I'll probably have time to start
coding them soon. Thanks for all the support :slight_smile:

Cheers

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/37b1c902-a74d-4c35-bc41-5e1d5e76e72d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Have you seen the Tribe Node? This is a kind of a "merged state"
multi-master cluster.

Jörg

On Fri, Jun 27, 2014 at 1:39 AM, Matthew Parrott matthewatabet@gmail.com
wrote:

Hi!

Have there been any further explorations in the area of wan replication?

I have ES clusters in multiple datacenters connected via high-speed
private network. I'm wondering if multi-master replication would be
possible in this environment or if we'd need some type of 'shovel' plugin
like the one described here to ship data between the DCs.

Thanks,
Matthew

On Tuesday, July 23, 2013 10:06:10 AM UTC-7, Jörg Prante wrote:

Yes, I once examined Kafka, and discovered that many components are
already there in Elasticsearch. For example, the activity stream is already
there as ES translog (if you focus on indexing operations) and the ES
gateway is a useful persistency store mechanism. What I didn't like was the
single Kafka JVM, and the Zookeeper infrastructure, it is all adding up
complexity beside ES.

For cross-cluster replication, I think the best approach is distributed
log replication. This is hard, because logged ES operations must be
synchronized by an external time source (e.g. vector clocks) to use them
like a global event stream. A pubsub mechanism could then work at the
primary shards of an index in the ES node as a service, merging the
translogs for an external agent who previously subscribed to the
replication stream. The vector clock is required for a distributed time
machine like behavior (snapshots), assuming the translog is not deleted,
but stored for a certain time window.

Jörg

On Tue, Jul 23, 2013 at 3:55 PM, Vinicius Carvalho <vinicius...@gmail.com

wrote:

Thanks again Jorg, so that you know I'm actually considering using kafka
for intra cluster replication. We want to push the index operations to a
topic and then other clusters on different DCs would subscribe to this.
Conflict resolution will be last commit will win. And in case of kafka
cluster failure we will append changes to a local index, and then send them
over as the bus is back. In the case ES cluster dies, and when it recovers,
one nice thing on kafka is that one can request messages based on an
offset, so we could start consuming messages from the last point the
cluster had consume them.

It's all ideas I'm working right now. I'll probably have time to start
coding them soon. Thanks for all the support :slight_smile:

Cheers

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/37b1c902-a74d-4c35-bc41-5e1d5e76e72d%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/37b1c902-a74d-4c35-bc41-5e1d5e76e72d%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFpFnUvDrVdcui2opE3iju%3DzL%3DPTCMH8RXLVX0E4%2BetMQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Hey!

I have looked at tribes, but didn't look deeply because of this:

"The merged view cannot handle indices with the same name in multiple
clusters."

I'd like to have indexes replicated across datacenters. Is there a way to
accomplish that with tribes?

Thanks!

On Friday, June 27, 2014 2:29:46 AM UTC-7, Jörg Prante wrote:

Have you seen the Tribe Node? This is a kind of a "merged state"
multi-master cluster.

Elasticsearch Platform — Find real-time answers at scale | Elastic

Jörg

On Fri, Jun 27, 2014 at 1:39 AM, Matthew Parrott <matthe...@gmail.com
<javascript:>> wrote:

Hi!

Have there been any further explorations in the area of wan replication?

I have ES clusters in multiple datacenters connected via high-speed
private network. I'm wondering if multi-master replication would be
possible in this environment or if we'd need some type of 'shovel' plugin
like the one described here to ship data between the DCs.

Thanks,
Matthew

On Tuesday, July 23, 2013 10:06:10 AM UTC-7, Jörg Prante wrote:

Yes, I once examined Kafka, and discovered that many components are
already there in Elasticsearch. For example, the activity stream is already
there as ES translog (if you focus on indexing operations) and the ES
gateway is a useful persistency store mechanism. What I didn't like was the
single Kafka JVM, and the Zookeeper infrastructure, it is all adding up
complexity beside ES.

For cross-cluster replication, I think the best approach is distributed
log replication. This is hard, because logged ES operations must be
synchronized by an external time source (e.g. vector clocks) to use them
like a global event stream. A pubsub mechanism could then work at the
primary shards of an index in the ES node as a service, merging the
translogs for an external agent who previously subscribed to the
replication stream. The vector clock is required for a distributed time
machine like behavior (snapshots), assuming the translog is not deleted,
but stored for a certain time window.

Jörg

On Tue, Jul 23, 2013 at 3:55 PM, Vinicius Carvalho <
vinicius...@gmail.com> wrote:

Thanks again Jorg, so that you know I'm actually considering using
kafka for intra cluster replication. We want to push the index operations
to a topic and then other clusters on different DCs would subscribe to
this. Conflict resolution will be last commit will win. And in case of
kafka cluster failure we will append changes to a local index, and then
send them over as the bus is back. In the case ES cluster dies, and when it
recovers, one nice thing on kafka is that one can request messages based on
an offset, so we could start consuming messages from the last point the
cluster had consume them.

It's all ideas I'm working right now. I'll probably have time to start
coding them soon. Thanks for all the support :slight_smile:

Cheers

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/37b1c902-a74d-4c35-bc41-5e1d5e76e72d%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/37b1c902-a74d-4c35-bc41-5e1d5e76e72d%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/28808d58-62c2-433e-b932-c93d824f0a97%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

I found this note:

Which mentions:
"Later on we plan on making cross data-center replication possible by
adding the ability to do incremental restores into a read-only index."

Is that feature still on the roadmap?

Thanks

On Friday, June 27, 2014 10:40:55 AM UTC-7, Matthew Parrott wrote:

Hey!

I have looked at tribes, but didn't look deeply because of this:

"The merged view cannot handle indices with the same name in multiple
clusters."

I'd like to have indexes replicated across datacenters. Is there a way to
accomplish that with tribes?

Thanks!

On Friday, June 27, 2014 2:29:46 AM UTC-7, Jörg Prante wrote:

Have you seen the Tribe Node? This is a kind of a "merged state"
multi-master cluster.

Elasticsearch Platform — Find real-time answers at scale | Elastic

Jörg

On Fri, Jun 27, 2014 at 1:39 AM, Matthew Parrott matthe...@gmail.com
wrote:

Hi!

Have there been any further explorations in the area of wan replication?

I have ES clusters in multiple datacenters connected via high-speed
private network. I'm wondering if multi-master replication would be
possible in this environment or if we'd need some type of 'shovel' plugin
like the one described here to ship data between the DCs.

Thanks,
Matthew

On Tuesday, July 23, 2013 10:06:10 AM UTC-7, Jörg Prante wrote:

Yes, I once examined Kafka, and discovered that many components are
already there in Elasticsearch. For example, the activity stream is already
there as ES translog (if you focus on indexing operations) and the ES
gateway is a useful persistency store mechanism. What I didn't like was the
single Kafka JVM, and the Zookeeper infrastructure, it is all adding up
complexity beside ES.

For cross-cluster replication, I think the best approach is distributed
log replication. This is hard, because logged ES operations must be
synchronized by an external time source (e.g. vector clocks) to use them
like a global event stream. A pubsub mechanism could then work at the
primary shards of an index in the ES node as a service, merging the
translogs for an external agent who previously subscribed to the
replication stream. The vector clock is required for a distributed time
machine like behavior (snapshots), assuming the translog is not deleted,
but stored for a certain time window.

Jörg

On Tue, Jul 23, 2013 at 3:55 PM, Vinicius Carvalho <
vinicius...@gmail.com> wrote:

Thanks again Jorg, so that you know I'm actually considering using
kafka for intra cluster replication. We want to push the index operations
to a topic and then other clusters on different DCs would subscribe to
this. Conflict resolution will be last commit will win. And in case of
kafka cluster failure we will append changes to a local index, and then
send them over as the bus is back. In the case ES cluster dies, and when it
recovers, one nice thing on kafka is that one can request messages based on
an offset, so we could start consuming messages from the last point the
cluster had consume them.

It's all ideas I'm working right now. I'll probably have time to start
coding them soon. Thanks for all the support :slight_smile:

Cheers

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/37b1c902-a74d-4c35-bc41-5e1d5e76e72d%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/37b1c902-a74d-4c35-bc41-5e1d5e76e72d%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c0caf8a9-98a3-4e00-aa7b-abec5c98a542%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.