Is it possible to control the last_seq of a new or existing couchdb river?


(Brian 'Phunk' Gadoury) #1

I'm trying to figure out if it is possible to create a new river with a
specified last_seq value. Alternatively, if it's possible to modify the
last_seq value of an existing river and get it to start processing from
that last_seq.

Our use case is that we have multiple indices that have the same documents
but slightly different schemas. Each index has its own river pointed at our
main BigCouch database. We control which index gets used by our application
using an index alias.

We always want our river named "live_river" to point at whatever index our
alias points at. We accomplish this by just deleting our "live_river" and
creating a new "live_river" whenever we adjust our index alias to put a new
index into production use.

However, recreating that river means it will re-index all 4.8 million (and
growing) documents needlessly. This no-op overhead is what I'm trying to
avoid.

I've tried manually updating an existing river with
localhost:9200/_river/live_river/_seq -d '{"couchdb":{"last_seq":"100"}}'
and it updates that doc, but it has no effect on the behavior of the
live_river.

Any ideas?

-Brian

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(David Pilato) #2

Looking at couchdb river source code, it seems that updating last_seq will have no effect under a continuous load because it will be erased by https://github.com/elasticsearch/elasticsearch-river-couchdb/blob/master/src/main/java/org/elasticsearch/river/couchdb/CouchdbRiver.java#L374

I think the only way to handle that is by modifying the source code. On another river project, someone asked me to implement a _stop / _start endpoint which pause and resume the river.
It could help here to update last_seq.

May be you should open an issue in couchdb river project?

HTH

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 5 sept. 2013 à 07:17, Brian Gadoury bgadoury@endpoint.com a écrit :

I'm trying to figure out if it is possible to create a new river with a specified last_seq value. Alternatively, if it's possible to modify the last_seq value of an existing river and get it to start processing from that last_seq.

Our use case is that we have multiple indices that have the same documents but slightly different schemas. Each index has its own river pointed at our main BigCouch database. We control which index gets used by our application using an index alias.

We always want our river named "live_river" to point at whatever index our alias points at. We accomplish this by just deleting our "live_river" and creating a new "live_river" whenever we adjust our index alias to put a new index into production use.

However, recreating that river means it will re-index all 4.8 million (and growing) documents needlessly. This no-op overhead is what I'm trying to avoid.

I've tried manually updating an existing river with localhost:9200/_river/live_river/_seq -d '{"couchdb":{"last_seq":"100"}}' and it updates that doc, but it has no effect on the behavior of the live_river.

Any ideas?

-Brian

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Brian 'Phunk' Gadoury) #3

Thanks for the reply, David.

I'm quite embarrassed that I didn't think to check the source myself. I'll
open an issue and hopefully see you there. :slight_smile:

-Brian

On Thursday, September 5, 2013 12:15:33 AM UTC-6, David Pilato wrote:

Looking at couchdb river source code, it seems that updating last_seq will
have no effect under a continuous load because it will be erased by
https://github.com/elasticsearch/elasticsearch-river-couchdb/blob/master/src/main/java/org/elasticsearch/river/couchdb/CouchdbRiver.java#L374

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #4