Is there a way to force a couchdb river update?

Andrius_Juozapaitis · March 27, 2012, 1:39pm

Hey all,

Looking for advice here. I am using couchdb as my data store, with
elasticsearch handling pretty much all the read operations. I've
encountered an issue with the near-real-time indexing through couchdb
river though:

My spring MVC controller extracts data (say, a product list) using
elasticsearch java api
The user submits a new product
The product is saved in couchdb
The controller, following a redirect-after-post pattern, does an
http redirect to the product list (see 1.)

The problem here is that the controller in step 1. is querying
elasticsearch data which hasn't been updated yet by the couchdb river.
I am aware there are a number of potential solutions to this problem
(also storing the saved data in session, querying couchdb directly,
delaying the redirect, using ajax, etc).

The most transparent way, IMHO, would be to issue a request to
elasticsearch (or couchdb river?) that would block until the couchdb
river processes all the pending changes in the _changes feed, and only
then perform the http redirect in the step 4. Is it possible to do
through the existing APIs? Are there any other options I missed? In
other words, I'd be happy for any constructive feedback here.

Best regards,
Andrius

kimchy · March 27, 2012, 6:53pm

There isn't a way to wait till it gets indexed in elasticsearch, I mean,
the river does not expose it. I can't think of a really nice solution
except maybe to do a get for hte doc in elasticsearch (without the _source,
i.e. empty fields), and then poll the doc with another get till the
_version changes...

On Tue, Mar 27, 2012 at 3:39 PM, Andrius Juozapaitis andriusj@gmail.comwrote:

Hey all,

Looking for advice here. I am using couchdb as my data store, with
elasticsearch handling pretty much all the read operations. I've
encountered an issue with the near-real-time indexing through couchdb
river though:

My spring MVC controller extracts data (say, a product list) using
elasticsearch java api

The user submits a new product

The product is saved in couchdb

The controller, following a redirect-after-post pattern, does an
http redirect to the product list (see 1.)

The problem here is that the controller in step 1. is querying
elasticsearch data which hasn't been updated yet by the couchdb river.
I am aware there are a number of potential solutions to this problem
(also storing the saved data in session, querying couchdb directly,
delaying the redirect, using ajax, etc).

The most transparent way, IMHO, would be to issue a request to
elasticsearch (or couchdb river?) that would block until the couchdb
river processes all the pending changes in the _changes feed, and only
then perform the http redirect in the step 4. Is it possible to do
through the existing APIs? Are there any other options I missed? In
other words, I'd be happy for any constructive feedback here.

Best regards,
Andrius

Andrius_Juozapaitis · March 28, 2012, 12:50pm

Looking at the couchdb river code, it looks like it's fairly simple to
expose the river state, simply add a flag to CouchdbRiver.Indexer.run()
loop:

                idle = true;
                logger.info("finished indexing");
                String s;
                try {
                    s = stream.take();
                    logger.info("indexing...");
                    idle = false;

Since the stream is a blocking queue, idle flag in CouchdbRiver
instance would only be true if there are no pending changes in the feed.
The next question is, is there an easy way to expose additional river state
through the existing JSON API? _meta information, perhaps?

thanks in advance,
Andrius

On Tuesday, March 27, 2012 9:53:08 PM UTC+3, kimchy wrote:

There isn't a way to wait till it gets indexed in elasticsearch, I mean,
the river does not expose it. I can't think of a really nice solution
except maybe to do a get for hte doc in elasticsearch (without the _source,
i.e. empty fields), and then poll the doc with another get till the
_version changes...

On Tue, Mar 27, 2012 at 3:39 PM, Andrius Juozapaitis andriusj@gmail.comwrote:

Hey all,

Looking for advice here. I am using couchdb as my data store, with
elasticsearch handling pretty much all the read operations. I've
encountered an issue with the near-real-time indexing through couchdb
river though:

My spring MVC controller extracts data (say, a product list) using
elasticsearch java api

The user submits a new product

The product is saved in couchdb

The controller, following a redirect-after-post pattern, does an
http redirect to the product list (see 1.)

The problem here is that the controller in step 1. is querying
elasticsearch data which hasn't been updated yet by the couchdb river.
I am aware there are a number of potential solutions to this problem
(also storing the saved data in session, querying couchdb directly,
delaying the redirect, using ajax, etc).

The most transparent way, IMHO, would be to issue a request to
elasticsearch (or couchdb river?) that would block until the couchdb
river processes all the pending changes in the _changes feed, and only
then perform the http redirect in the step 4. Is it possible to do
through the existing APIs? Are there any other options I missed? In
other words, I'd be happy for any constructive feedback here.

Best regards,
Andrius

kimchy · March 29, 2012, 12:01pm

We can add the flag, or update the a "status" json so you can read, but its
not enough, since you can't really tell anything if the relevant document
you are interested in was processed or not.

On Wed, Mar 28, 2012 at 2:50 PM, Andrius Juozapaitis andriusj@gmail.comwrote:

Looking at the couchdb river code, it looks like it's fairly simple to
expose the river state, simply add a flag to CouchdbRiver.Indexer.run()
loop:
                idle = true;
>                 logger.info("finished indexing");
>                 String s;
>                 try {
>                     s = stream.take();
>                     logger.info("indexing...");
>                     idle = false;
Since the stream is a blocking queue, idle flag in CouchdbRiver
instance would only be true if there are no pending changes in the feed.
The next question is, is there an easy way to expose additional river state
through the existing JSON API? _meta information, perhaps?

thanks in advance,
Andrius

On Tuesday, March 27, 2012 9:53:08 PM UTC+3, kimchy wrote:

There isn't a way to wait till it gets indexed in elasticsearch, I mean,
the river does not expose it. I can't think of a really nice solution
except maybe to do a get for hte doc in elasticsearch (without the _source,
i.e. empty fields), and then poll the doc with another get till the
_version changes...

On Tue, Mar 27, 2012 at 3:39 PM, Andrius Juozapaitis andriusj@gmail.comwrote:

Hey all,

Looking for advice here. I am using couchdb as my data store, with
elasticsearch handling pretty much all the read operations. I've
encountered an issue with the near-real-time indexing through couchdb
river though:

My spring MVC controller extracts data (say, a product list) using
elasticsearch java api

The user submits a new product

The product is saved in couchdb

The controller, following a redirect-after-post pattern, does an
http redirect to the product list (see 1.)

The problem here is that the controller in step 1. is querying
elasticsearch data which hasn't been updated yet by the couchdb river.
I am aware there are a number of potential solutions to this problem
(also storing the saved data in session, querying couchdb directly,
delaying the redirect, using ajax, etc).

The most transparent way, IMHO, would be to issue a request to
elasticsearch (or couchdb river?) that would block until the couchdb
river processes all the pending changes in the _changes feed, and only
then perform the http redirect in the step 4. Is it possible to do
through the existing APIs? Are there any other options I missed? In
other words, I'd be happy for any constructive feedback here.

Best regards,
Andrius

Andrius_Juozapaitis · March 29, 2012, 12:12pm

Yes, it's by no means a one-size-fits-all solution. Then again, I gave your
initial suggestion another thought - whenever I do an insert/update/delete
in my service layer, I get back the doc id+revision. So the most
straightforward way would be to just poll ES until that query (id+revision)
will return a hit, then do a redirect and be done with it. Not perfect, but
easy to implement, and will work just fine in my case.

Thanks!
Andrius

On Thursday, March 29, 2012 3:01:37 PM UTC+3, kimchy wrote:

We can add the flag, or update the a "status" json so you can read, but
its not enough, since you can't really tell anything if the relevant
document you are interested in was processed or not.

On Wed, Mar 28, 2012 at 2:50 PM, Andrius Juozapaitis andriusj@gmail.comwrote:
Looking at the couchdb river code, it looks like it's fairly simple to
expose the river state, simply add a flag to CouchdbRiver.Indexer.run()
loop:
                idle = true;
>>                 logger.info("finished indexing");
>>                 String s;
>>                 try {
>>                     s = stream.take();
>>                     logger.info("indexing...");
>>                     idle = false;
Since the stream is a blocking queue, idle flag in CouchdbRiver
instance would only be true if there are no pending changes in the feed.
The next question is, is there an easy way to expose additional river state
through the existing JSON API? _meta information, perhaps?

thanks in advance,
Andrius

On Tuesday, March 27, 2012 9:53:08 PM UTC+3, kimchy wrote:

There isn't a way to wait till it gets indexed in elasticsearch, I mean,
the river does not expose it. I can't think of a really nice solution
except maybe to do a get for hte doc in elasticsearch (without the _source,
i.e. empty fields), and then poll the doc with another get till the
_version changes...

On Tue, Mar 27, 2012 at 3:39 PM, Andrius Juozapaitis <andriusj@gmail.com

wrote:

Hey all,

Looking for advice here. I am using couchdb as my data store, with
elasticsearch handling pretty much all the read operations. I've
encountered an issue with the near-real-time indexing through couchdb
river though:

My spring MVC controller extracts data (say, a product list) using
elasticsearch java api

The user submits a new product

The product is saved in couchdb

The controller, following a redirect-after-post pattern, does an
http redirect to the product list (see 1.)

The problem here is that the controller in step 1. is querying
elasticsearch data which hasn't been updated yet by the couchdb river.
I am aware there are a number of potential solutions to this problem
(also storing the saved data in session, querying couchdb directly,
delaying the redirect, using ajax, etc).

The most transparent way, IMHO, would be to issue a request to
elasticsearch (or couchdb river?) that would block until the couchdb
river processes all the pending changes in the _changes feed, and only
then perform the http redirect in the step 4. Is it possible to do
through the existing APIs? Are there any other options I missed? In
other words, I'd be happy for any constructive feedback here.

Best regards,
Andrius

Alexander_Petrichkov · April 10, 2013, 9:20pm

Solution: put/delete doc to db and put/delete doc to ES.

вторник, 27 марта 2012 г., 16:39:27 UTC+3 пользователь Andrius Juozapaitis
написал:

Hey all,

Looking for advice here. I am using couchdb as my data store, with
elasticsearch handling pretty much all the read operations. I've
encountered an issue with the near-real-time indexing through couchdb
river though:

My spring MVC controller extracts data (say, a product list) using
elasticsearch java api

The user submits a new product

The product is saved in couchdb

The controller, following a redirect-after-post pattern, does an
http redirect to the product list (see 1.)

The problem here is that the controller in step 1. is querying
elasticsearch data which hasn't been updated yet by the couchdb river.
I am aware there are a number of potential solutions to this problem
(also storing the saved data in session, querying couchdb directly,
delaying the redirect, using ajax, etc).

The most transparent way, IMHO, would be to issue a request to
elasticsearch (or couchdb river?) that would block until the couchdb
river processes all the pending changes in the _changes feed, and only
then perform the http redirect in the step 4. Is it possible to do
through the existing APIs? Are there any other options I missed? In
other words, I'd be happy for any constructive feedback here.

Best regards,
Andrius

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Alexander_Petrichkov · April 27, 2013, 2:52pm

After our tests we can see that just putting/deleting document to elastic
search index doesn't work.
We delete some object from the list (delete from db and delete from elastic
search by delete API using http request).
Then we make redirect to the list page and deleted object is still there.
Think that http request is finishing before index realy will be updated for
performance reason.

вторник, 27 марта 2012 г., 16:39:27 UTC+3 пользователь Andrius Juozapaitis
написал:

Hey all,

Looking for advice here. I am using couchdb as my data store, with
elasticsearch handling pretty much all the read operations. I've
encountered an issue with the near-real-time indexing through couchdb
river though:

My spring MVC controller extracts data (say, a product list) using
elasticsearch java api

The user submits a new product

The product is saved in couchdb

The controller, following a redirect-after-post pattern, does an
http redirect to the product list (see 1.)

The problem here is that the controller in step 1. is querying
elasticsearch data which hasn't been updated yet by the couchdb river.
I am aware there are a number of potential solutions to this problem
(also storing the saved data in session, querying couchdb directly,
delaying the redirect, using ajax, etc).

The most transparent way, IMHO, would be to issue a request to
elasticsearch (or couchdb river?) that would block until the couchdb
river processes all the pending changes in the _changes feed, and only
then perform the http redirect in the step 4. Is it possible to do
through the existing APIs? Are there any other options I missed? In
other words, I'd be happy for any constructive feedback here.

Best regards,
Andrius

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Alexander_Petrichkov · April 27, 2013, 3:25pm

Look at CouchDB Document API (think we need the same implemented in
Elasticsearch for create, update and delete logic):

Update Existing Document
Request Method: PUT
Request URI: /[db_name]/[doc_id]
Request Headers: X-Couch-Full-Commit: true (optional). Ensure that
the document has synced to disk before returning
success.
Request Body: The document itself as a JSON object. It must
include the _rev property, with the revision
number of the document the update is based on as
the value.
Request Parameters: None
Description: Updates an existing document and replaces it with
a new revision
Sample Request URI: http://127.0.0.1:5984/employees/126
The following is a sample response:
{"ok":true,"id":126","rev":"2-4058198378"}

вторник, 27 марта 2012 г., 16:39:27 UTC+3 пользователь Andrius Juozapaitis
написал:

Hey all,

Looking for advice here. I am using couchdb as my data store, with
elasticsearch handling pretty much all the read operations. I've
encountered an issue with the near-real-time indexing through couchdb
river though:

My spring MVC controller extracts data (say, a product list) using
elasticsearch java api

The user submits a new product

The product is saved in couchdb

The controller, following a redirect-after-post pattern, does an
http redirect to the product list (see 1.)

The problem here is that the controller in step 1. is querying
elasticsearch data which hasn't been updated yet by the couchdb river.
I am aware there are a number of potential solutions to this problem
(also storing the saved data in session, querying couchdb directly,
delaying the redirect, using ajax, etc).

The most transparent way, IMHO, would be to issue a request to
elasticsearch (or couchdb river?) that would block until the couchdb
river processes all the pending changes in the _changes feed, and only
then perform the http redirect in the step 4. Is it possible to do
through the existing APIs? Are there any other options I missed? In
other words, I'd be happy for any constructive feedback here.

Best regards,
Andrius

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Alexander_Petrichkov · April 28, 2013, 1:56pm

Yepp. We have done temporary fix in our code while it's not implemented in
ES.
You can check that index is updated.
It takes some time, but tests show that problem is not reproducible now.

вторник, 27 марта 2012 г., 16:39:27 UTC+3 пользователь Andrius Juozapaitis
написал:

Hey all,

Looking for advice here. I am using couchdb as my data store, with
elasticsearch handling pretty much all the read operations. I've
encountered an issue with the near-real-time indexing through couchdb
river though:

My spring MVC controller extracts data (say, a product list) using
elasticsearch java api

The user submits a new product

The product is saved in couchdb

The controller, following a redirect-after-post pattern, does an
http redirect to the product list (see 1.)

The problem here is that the controller in step 1. is querying
elasticsearch data which hasn't been updated yet by the couchdb river.
I am aware there are a number of potential solutions to this problem
(also storing the saved data in session, querying couchdb directly,
delaying the redirect, using ajax, etc).

The most transparent way, IMHO, would be to issue a request to
elasticsearch (or couchdb river?) that would block until the couchdb
river processes all the pending changes in the _changes feed, and only
then perform the http redirect in the step 4. Is it possible to do
through the existing APIs? Are there any other options I missed? In
other words, I'd be happy for any constructive feedback here.

Best regards,
Andrius

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Topic		Replies	Views
CouchDB index problem Elasticsearch	11	346	July 6, 2017
CouchDB river and flush index in ES Elasticsearch	2	295	July 6, 2017
[ANN] Elasticsearch CouchDB River plugin 2.0.0 released Elasticsearch	1	326	July 6, 2017
Race condition Elasticsearch	3	1741	July 6, 2017
[ANN] Elasticsearch CouchDB River plugin 2.5.0 released Elasticsearch	1	319	July 6, 2017

Is there a way to force a couchdb river update?

Related topics