Race condition

I'm using couchdb for storage for a Node.js app I've written. I was using
the couchriver plugin to get data into Elasticsearch from couch, but I
wanted to perform some substantial manipulation of the different document
types before they were sent to ES so I wrote my own 'indexer' that listens
to the couchdb changes feed, munges the documents and then sends them to ES.

One area of my application queries ES to display a table of users. Each row
in the table has a 'delete' link to delete the user. When I click the link
I'm taken to a separate confirmation page. When I confirm the deletion my
application performs a 'soft delete' in couchdb. As soon as the 'soft
delete' is recorded by couchdb, my application redirects to the page that
displays the table of users again.

I see a console message telling me my indexer ran immediately after the
soft delete and before any data is sent to my application. That should
mean the updated copy of the document has been put into ES, but I can't be
certain on the timing of everything. When my application queries ES for a
list of users, it filters out documents that have been soft deleted so I
shouldn't get the soft deleted document back.

However, when my application redirects to the table of users and queries ES
immediately after a soft delete, ES sends the document I soft deleted back
for some reason. All subsequent queries to ES do not return the soft
deleted document though. I suspect what's happening is the indexer is
sending the change to ES and ES is still processing the change when the
request for the updated user list comes in. ES answers the query, finishes
processing the change, and all future requests work properly. I'm not sure
if that's right though.

Have you run into this before? I'm considering my options for getting
around this. Fundamentally, having two sets of data, one in couch and one
in ES, is less than ideal. Right now the best solution seems to be to get
rid of the indexer and have my application manually record changes in the
database and ES before it redirects. Does the river have a way of dealing
with this issue?

Thanks,

Troy

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

I think it's because Elasticsearch is a Near Real Time search engine.
You can have 1 second between the moment you send the delete request and the moment it's not searchable.

You can run a _refresh after the delete if you want to make sure that delete has been merged in Lucene.

My 2 cents

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 15 mars 2013 à 22:32, troy@scriptedmotion.com a écrit :

I'm using couchdb for storage for a Node.js app I've written. I was using the couchriver plugin to get data into Elasticsearch from couch, but I wanted to perform some substantial manipulation of the different document types before they were sent to ES so I wrote my own 'indexer' that listens to the couchdb changes feed, munges the documents and then sends them to ES.

One area of my application queries ES to display a table of users. Each row in the table has a 'delete' link to delete the user. When I click the link I'm taken to a separate confirmation page. When I confirm the deletion my application performs a 'soft delete' in couchdb. As soon as the 'soft delete' is recorded by couchdb, my application redirects to the page that displays the table of users again.

I see a console message telling me my indexer ran immediately after the soft delete and before any data is sent to my application. That should mean the updated copy of the document has been put into ES, but I can't be certain on the timing of everything. When my application queries ES for a list of users, it filters out documents that have been soft deleted so I shouldn't get the soft deleted document back.

However, when my application redirects to the table of users and queries ES immediately after a soft delete, ES sends the document I soft deleted back for some reason. All subsequent queries to ES do not return the soft deleted document though. I suspect what's happening is the indexer is sending the change to ES and ES is still processing the change when the request for the updated user list comes in. ES answers the query, finishes processing the change, and all future requests work properly. I'm not sure if that's right though.

Have you run into this before? I'm considering my options for getting around this. Fundamentally, having two sets of data, one in couch and one in ES, is less than ideal. Right now the best solution seems to be to get rid of the indexer and have my application manually record changes in the database and ES before it redirects. Does the river have a way of dealing with this issue?

Thanks,

Troy

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

A few possible options:

  1. When the delete user confirmation page sends the request to the server, the server should block until it has confirmed that the delete has occurred successfully in both Couchdb and ES. Basically, the operation should be atomic.

  2. When the server fetches user records from ES, filter it against the active user ids in couchdb. Or, for better performance, filter against just the deleted user ids in couchdb that occurred perhaps in the last 15 secs.

  3. Implement a trick in the UI. After the user is deleted, store that user id somewhere, like in a cookie and give it perhaps a max 5 seconds lifetime. On the next page which shows the latest list of users, have the page filter out that particular user record from the server response before displaying the list.

Be careful with #3 because you could have a scenario where two separate end users reach the users display page, but only the user that applied the delete operation will see the filtered display because the trick was done in his browser. If all possible, do all required filtering on the server side.

On Mar 15, 2013, at 2:32 PM, troy@scriptedmotion.com wrote:

I'm using couchdb for storage for a Node.js app I've written. I was using the couchriver plugin to get data into Elasticsearch from couch, but I wanted to perform some substantial manipulation of the different document types before they were sent to ES so I wrote my own 'indexer' that listens to the couchdb changes feed, munges the documents and then sends them to ES.

One area of my application queries ES to display a table of users. Each row in the table has a 'delete' link to delete the user. When I click the link I'm taken to a separate confirmation page. When I confirm the deletion my application performs a 'soft delete' in couchdb. As soon as the 'soft delete' is recorded by couchdb, my application redirects to the page that displays the table of users again.

I see a console message telling me my indexer ran immediately after the soft delete and before any data is sent to my application. That should mean the updated copy of the document has been put into ES, but I can't be certain on the timing of everything. When my application queries ES for a list of users, it filters out documents that have been soft deleted so I shouldn't get the soft deleted document back.

However, when my application redirects to the table of users and queries ES immediately after a soft delete, ES sends the document I soft deleted back for some reason. All subsequent queries to ES do not return the soft deleted document though. I suspect what's happening is the indexer is sending the change to ES and ES is still processing the change when the request for the updated user list comes in. ES answers the query, finishes processing the change, and all future requests work properly. I'm not sure if that's right though.

Have you run into this before? I'm considering my options for getting around this. Fundamentally, having two sets of data, one in couch and one in ES, is less than ideal. Right now the best solution seems to be to get rid of the indexer and have my application manually record changes in the database and ES before it redirects. Does the river have a way of dealing with this issue?

Thanks,

Troy

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.