Elastricsearch missing data from couchdb


(zuhaib-2) #1

Hello,

I have setup a test env with couchdb and Elastricsearch seeing if we can
use it in place of couchdb-lucene. But I am seeing a major problem.

After a few days of running in my VM suddenly Elastricsearch stopped seeing
changes from couchdb, seeing nothing in the log i restarted Elastricsearch
and suddenly it started to get new documents getting inserted but all the
doc inserted prior to that did not get index at all. Checking the changes
on couchdb with the seq id in Elastricsearch shows no changes and it is the
current according to couchdb. My question is how can I get the old
documents that Elastricsearch did not index in to be reindexed easily.

A follow up to that is in productions lets say the connections from my
couchdb to Elastricsearch goes down but docs are still getting inserted in
to couchdb and then after x time the link goes back up, should all the
information added to couchdb be index by Elastricsearch or will I have the
problem I see above?

Thanks
Zuhaib


(David Pilato) #2

I think you are hitting the same issue as here: https://groups.google.com/forum/?hl=fr&fromgroups&nomobile=true#!msg/elasticsearch/WWNIATEqWho/ZZzwL8v77xYJ

David

Le 5 août 2012 à 04:01, zuhaib zsiddique@atlassian.com a écrit :

Hello,

I have setup a test env with couchdb and Elastricsearch seeing if we can use it in place of couchdb-lucene. But I am seeing a major problem.

After a few days of running in my VM suddenly Elastricsearch stopped seeing changes from couchdb, seeing nothing in the log i restarted Elastricsearch and suddenly it started to get new documents getting inserted but all the doc inserted prior to that did not get index at all. Checking the changes on couchdb with the seq id in Elastricsearch shows no changes and it is the current according to couchdb. My question is how can I get the old documents that Elastricsearch did not index in to be reindexed easily.

A follow up to that is in productions lets say the connections from my couchdb to Elastricsearch goes down but docs are still getting inserted in to couchdb and then after x time the link goes back up, should all the information added to couchdb be index by Elastricsearch or will I have the problem I see above?

Thanks
Zuhaib


(zuhaib-2) #3

That seems close but not the same, in that case it seems the River plugin
would actually crash. In my case the couchdb river is still running but it
seems new data does not get pulled in if ES was down or the link was cut.

On Saturday, August 4, 2012 9:06:56 PM UTC-7, David Pilato wrote:

I think you are hitting the same issue as here:
https://groups.google.com/forum/?hl=fr&fromgroups&nomobile=true#!msg/elasticsearch/WWNIATEqWho/ZZzwL8v77xYJ

David

Le 5 août 2012 à 04:01, zuhaib zsiddique@atlassian.com a écrit :

Hello,

I have setup a test env with couchdb and Elastricsearch seeing if we can
use it in place of couchdb-lucene. But I am seeing a major problem.

After a few days of running in my VM suddenly Elastricsearch stopped
seeing changes from couchdb, seeing nothing in the log i
restarted Elastricsearch and suddenly it started to get new documents
getting inserted but all the doc inserted prior to that did not get index
at all. Checking the changes on couchdb with the seq id in Elastricsearch
shows no changes and it is the current according to couchdb. My question
is how can I get the old documents that Elastricsearch did not index in to
be reindexed easily.

A follow up to that is in productions lets say the connections from my
couchdb to Elastricsearch goes down but docs are still getting inserted in
to couchdb and then after x time the link goes back up, should all the
information added to couchdb be index by Elastricsearch or will I have the
problem I see above?

Thanks
Zuhaib


(system) #4