CouchDB _river, ES failure and missing docs

Hi all!

Having an issue similar
to https://groups.google.com/forum/?fromgroups=#!searchin/elasticsearch/river$20missing/elasticsearch/Y8G4eu-sbrE/bNyopwUP_kgJ

We had a node failure which we repaired and brought back up. ES re
balanced itself and is all green (using 0.19.2). We're using the CouchDB
river which generally works well. However, it missed some documents while
we were down and now we have a gap that it doesn't' seem to be filling in.

Is there a way to have the _river re-scan _changes or bootstrap it somehow?
We're not sure how to get the gaps filled in.

Thanks!

--

What you can probably do is to modify the meta data of the river. As far as I
remember, you will find the latest change number that Elasticsearch process.
You can update this particular document and set the field to xxxx (1 if you want
to start from the begining).

Does it help?

David.

Le 18 décembre 2012 à 18:56, JP Toto james.p.toto@gmail.com a écrit :

Hi all!

Having an issue similar to
https://groups.google.com/forum/?fromgroups=#!searchin/elasticsearch/river$20missing/elasticsearch/Y8G4eu-sbrE/bNyopwUP_kgJ

We had a node failure which we repaired and brought back up. ES re balanced
itself and is all green (using 0.19.2). We're using the CouchDB river which
generally works well. However, it missed some documents while we were down and
now we have a gap that it doesn't' seem to be filling in.

Is there a way to have the _river re-scan _changes or bootstrap it somehow?
We're not sure how to get the gaps filled in.

Thanks!

--

--
David Pilato
http://www.scrutmydocs.org/
http://dev.david.pilato.fr/
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

--

David,

Thanks for the reply! Do you mean the the last_seq field? Right now that
looks like the below: If I set it to a previous id, will it dupe any
documents that are after that? I don't mind overwriting documents but I
don't want to duplicate any.

Thanks for the help!

  • last_seq:
    109019340-g1AAAAXpeJyV1DssQ1EYB_AbxGCXGCQGm0TjuUk0kXq03nxEDdKvtwlVvVLtym4zGMT7zdktFpsEs1liY_N-FMf_u1un5rvLf7m__3nce07KcZyq6VLXqXU57mUSQZebA_PeQnYulpltaGgMxFNezo2ls4F0IpvCyyUxhyuMOU2yE5qoU0lxhiuJRgT3Xukw3CjXWPsruL-mEDcVwXB_XGfMseDucd3IcCfcSjQkOPijw3DD3G5tXnCUdRjuh8PGHAoevSzEjUUw3BET0YC_29W6keEGecraL3_kvA7DfXPSmH3Bg8rvDHfAOaI-weEb3Zrh-nnJ2g_BkR3dyHCf6TJn15hlf-L1Kg65B4mIEK1KgRfVFvRCIt6s3ZaCtvvCgpaiBe-QiG0cNCkYftDOYAcS0UN0JgV9x9qCMCTixdoL_6CeawteIRGbxlxLQceKdg-2IBFdRLdSMKS7pCC7IRFP1t75S1jUFjxDItaNefT_w3JtwQYkIkT0JgVja6rbDrITEmHxSMHkTPIfOl4Kww

}

On Tuesday, December 18, 2012 12:59:13 PM UTC-5, David Pilato wrote:

What you can probably do is to modify the meta data of the river. As far
as I remember, you will find the latest change number that Elasticsearch
process.
You can update this particular document and set the field to xxxx (1 if
you want to start from the begining).

Does it help?

David.

Le 18 décembre 2012 à 18:56, JP Toto <james....@gmail.com <javascript:>>
a écrit :

Hi all!

Having an issue similar to
https://groups.google.com/forum/?fromgroups=#!searchin/elasticsearch/river$20missing/elasticsearch/Y8G4eu-sbrE/bNyopwUP_kgJ

We had a node failure which we repaired and brought back up. ES re
balanced itself and is all green (using 0.19.2). We're using the CouchDB
river which generally works well. However, it missed some documents while
we were down and now we have a gap that it doesn't' seem to be filling in.

Is there a way to have the _river re-scan _changes or bootstrap it
somehow? We're not sure how to get the gaps filled in.

Thanks!

--

--
David Pilato
http://www.scrutmydocs.org/
http://dev.david.pilato.fr/
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

--

Yes. That's this field. Sounds like you are using a BigCouch instance, don't
you? If so, you handle right now more than 100 millions changes, right?

In that case, you have to find the right change id given by bigcouch. I think
it's something like: xxxxxxx-BIGCOUCHADDON, where xxxxxxx is the real seq number
in couchDb.

David.

Le 18 décembre 2012 à 19:09, JP Toto james.p.toto@gmail.com a écrit :

David,

Thanks for the reply! Do you mean the the last_seq field? Right now that
looks like the below: If I set it to a previous id, will it dupe any documents
that are after that? I don't mind overwriting documents but I don't want to
duplicate any.

Thanks for the help!

   * last_seq:

109019340-g1AAAAXpeJyV1DssQ1EYB_AbxGCXGCQGm0TjuUk0kXq03nxEDdKvtwlVvVLtym4zGMT7zdktFpsEs1liY_N-FMf_u1un5rvLf7m__3nce07KcZyq6VLXqXU57mUSQZebA_PeQnYulpltaGgMxFNezo2ls4F0IpvCyyUxhyuMOU2yE5qoU0lxhiuJRgT3Xukw3CjXWPsruL-mEDcVwXB_XGfMseDucd3IcCfcSjQkOPijw3DD3G5tXnCUdRjuh8PGHAoevSzEjUUw3BET0YC_29W6keEGecraL3_kvA7DfXPSmH3Bg8rvDHfAOaI-weEb3Zrh-nnJ2g_BkR3dyHCf6TJn15hlf-L1Kg65B4mIEK1KgRfVFvRCIt6s3ZaCtvvCgpaiBe-QiG0cNCkYftDOYAcS0UN0JgV9x9qCMCTixdoL_6CeawteIRGbxlxLQceKdg-2IBFdRLdSMKS7pCC7IRFP1t75S1jUFjxDItaNefT_w3JtwQYkIkT0JgVja6rbDrITEmHxSMHkTPIfOl4Kww
}

On Tuesday, December 18, 2012 12:59:13 PM UTC-5, David Pilato wrote:

What you can probably do is to modify the meta data of the river. As
far as I remember, you will find the latest change number that
Elasticsearch process.
You can update this particular document and set the field to xxxx (1 if
you want to start from the begining).

Does it help?

David.

Le 18 décembre 2012 à 18:56, JP Toto < james....@gmail.com> a écrit :

> > > Hi all!
Having an issue similar to

https://groups.google.com/forum/?fromgroups=#!searchin/elasticsearch/river$20missing/elasticsearch/Y8G4eu-sbrE/bNyopwUP_kgJ
https://groups.google.com/forum/?fromgroups=#!searchin/elasticsearch/river$20missing/elasticsearch/Y8G4eu-sbrE/bNyopwUP_kgJ

We had a node failure which we repaired and brought back up. ES re

balanced itself and is all green (using 0.19.2). We're using the CouchDB
river which generally works well. However, it missed some documents while
we were down and now we have a gap that it doesn't' seem to be filling in.

Is there a way to have the _river re-scan _changes or bootstrap it

somehow? We're not sure how to get the gaps filled in.

Thanks!



--



 <https://groups.google.com/forum/?fromgroups=#!searchin/elasticsearch/river$20missing/elasticsearch/Y8G4eu-sbrE/bNyopwUP_kgJ>

--
David Pilato

<https://groups.google.com/forum/?fromgroups=#!searchin/elasticsearch/river$20missing/elasticsearch/Y8G4eu-sbrE/bNyopwUP_kgJ>
http://www.scrutmydocs.org/ <http://www.scrutmydocs.org/>
http://dev.david.pilato.fr/ <http://dev.david.pilato.fr/>
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

--

--
David Pilato
http://www.scrutmydocs.org/
http://dev.david.pilato.fr/
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

--

David, you're right about that! It's bigcouch - I forgot that detail :slight_smile:

Thanks for the advice! We'll try this.

On Tuesday, December 18, 2012 1:16:50 PM UTC-5, David Pilato wrote:

Yes. That's this field. Sounds like you are using a BigCouch instance,
don't you? If so, you handle right now more than 100 millions changes,
right?

In that case, you have to find the right change id given by bigcouch. I
think it's something like: xxxxxxx-BIGCOUCHADDON, where xxxxxxx is the real
seq number in couchDb.

David.

Le 18 décembre 2012 à 19:09, JP Toto <james....@gmail.com <javascript:>>
a écrit :

David,

Thanks for the reply! Do you mean the the last_seq field? Right now that
looks like the below: If I set it to a previous id, will it dupe any
documents that are after that? I don't mind overwriting documents but I
don't want to duplicate any.

Thanks for the help!

  • last_seq:
    109019340-g1AAAAXpeJyV1DssQ1EYB_AbxGCXGCQGm0TjuUk0kXq03nxEDdKvtwlVvVLtym4zGMT7zdktFpsEs1liY_N-FMf_u1un5rvLf7m__3nce07KcZyq6VLXqXU57mUSQZebA_PeQnYulpltaGgMxFNezo2ls4F0IpvCyyUxhyuMOU2yE5qoU0lxhiuJRgT3Xukw3CjXWPsruL-mEDcVwXB_XGfMseDucd3IcCfcSjQkOPijw3DD3G5tXnCUdRjuh8PGHAoevSzEjUUw3BET0YC_29W6keEGecraL3_kvA7DfXPSmH3Bg8rvDHfAOaI-weEb3Zrh-nnJ2g_BkR3dyHCf6TJn15hlf-L1Kg65B4mIEK1KgRfVFvRCIt6s3ZaCtvvCgpaiBe-QiG0cNCkYftDOYAcS0UN0JgV9x9qCMCTixdoL_6CeawteIRGbxlxLQceKdg-2IBFdRLdSMKS7pCC7IRFP1t75S1jUFjxDItaNefT_w3JtwQYkIkT0JgVja6rbDrITEmHxSMHkTPIfOl4Kww

}

On Tuesday, December 18, 2012 12:59:13 PM UTC-5, David Pilato wrote:

What you can probably do is to modify the meta data of the river. As far
as I remember, you will find the latest change number that Elasticsearch
process.
You can update this particular document and set the field to xxxx (1 if
you want to start from the begining).

Does it help?

David.

Le 18 décembre 2012 à 18:56, JP Toto < james....@gmail.com> a écrit :

Hi all!

Having an issue similar to
https://groups.google.com/forum/?fromgroups=#!searchin/elasticsearch/river$20missing/elasticsearch/Y8G4eu-sbrE/bNyopwUP_kgJ

We had a node failure which we repaired and brought back up. ES re
balanced itself and is all green (using 0.19.2). We're using the CouchDB
river which generally works well. However, it missed some documents while
we were down and now we have a gap that it doesn't' seem to be filling in.

Is there a way to have the _river re-scan _changes or bootstrap it
somehow? We're not sure how to get the gaps filled in.

Thanks!

--

https://groups.google.com/forum/?fromgroups=#!searchin/elasticsearch/river$20missing/elasticsearch/Y8G4eu-sbrE/bNyopwUP_kgJ

--
David Pilato

https://groups.google.com/forum/?fromgroups=#!searchin/elasticsearch/river$20missing/elasticsearch/Y8G4eu-sbrE/bNyopwUP_kgJ
http://www.scrutmydocs.org/
http://dev.david.pilato.fr/
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

--

--
David Pilato
http://www.scrutmydocs.org/
http://dev.david.pilato.fr/
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

--

Resetting the id won't dupe the subsequent docs in ES right?

On Tuesday, December 18, 2012 1:16:50 PM UTC-5, David Pilato wrote:

Yes. That's this field. Sounds like you are using a BigCouch instance,
don't you? If so, you handle right now more than 100 millions changes,
right?

In that case, you have to find the right change id given by bigcouch. I
think it's something like: xxxxxxx-BIGCOUCHADDON, where xxxxxxx is the real
seq number in couchDb.

David.

Le 18 décembre 2012 à 19:09, JP Toto <james....@gmail.com <javascript:>>
a écrit :

David,

Thanks for the reply! Do you mean the the last_seq field? Right now that
looks like the below: If I set it to a previous id, will it dupe any
documents that are after that? I don't mind overwriting documents but I
don't want to duplicate any.

Thanks for the help!

  • last_seq:
    109019340-g1AAAAXpeJyV1DssQ1EYB_AbxGCXGCQGm0TjuUk0kXq03nxEDdKvtwlVvVLtym4zGMT7zdktFpsEs1liY_N-FMf_u1un5rvLf7m__3nce07KcZyq6VLXqXU57mUSQZebA_PeQnYulpltaGgMxFNezo2ls4F0IpvCyyUxhyuMOU2yE5qoU0lxhiuJRgT3Xukw3CjXWPsruL-mEDcVwXB_XGfMseDucd3IcCfcSjQkOPijw3DD3G5tXnCUdRjuh8PGHAoevSzEjUUw3BET0YC_29W6keEGecraL3_kvA7DfXPSmH3Bg8rvDHfAOaI-weEb3Zrh-nnJ2g_BkR3dyHCf6TJn15hlf-L1Kg65B4mIEK1KgRfVFvRCIt6s3ZaCtvvCgpaiBe-QiG0cNCkYftDOYAcS0UN0JgV9x9qCMCTixdoL_6CeawteIRGbxlxLQceKdg-2IBFdRLdSMKS7pCC7IRFP1t75S1jUFjxDItaNefT_w3JtwQYkIkT0JgVja6rbDrITEmHxSMHkTPIfOl4Kww

}

On Tuesday, December 18, 2012 12:59:13 PM UTC-5, David Pilato wrote:

What you can probably do is to modify the meta data of the river. As far
as I remember, you will find the latest change number that Elasticsearch
process.
You can update this particular document and set the field to xxxx (1 if
you want to start from the begining).

Does it help?

David.

Le 18 décembre 2012 à 18:56, JP Toto < james....@gmail.com> a écrit :

Hi all!

Having an issue similar to
https://groups.google.com/forum/?fromgroups=#!searchin/elasticsearch/river$20missing/elasticsearch/Y8G4eu-sbrE/bNyopwUP_kgJ

We had a node failure which we repaired and brought back up. ES re
balanced itself and is all green (using 0.19.2). We're using the CouchDB
river which generally works well. However, it missed some documents while
we were down and now we have a gap that it doesn't' seem to be filling in.

Is there a way to have the _river re-scan _changes or bootstrap it
somehow? We're not sure how to get the gaps filled in.

Thanks!

--

https://groups.google.com/forum/?fromgroups=#!searchin/elasticsearch/river$20missing/elasticsearch/Y8G4eu-sbrE/bNyopwUP_kgJ

--
David Pilato

https://groups.google.com/forum/?fromgroups=#!searchin/elasticsearch/river$20missing/elasticsearch/Y8G4eu-sbrE/bNyopwUP_kgJ
http://www.scrutmydocs.org/
http://dev.david.pilato.fr/
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

--

--
David Pilato
http://www.scrutmydocs.org/
http://dev.david.pilato.fr/
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

--

I think it will update the docs in elasticsearch because documents have the same
_id in bigcouch and elasticsearch.

You can check that by getting an id in BigCouch and do
curl localhost:9200/index/type/YOURBIGCOUCHID and see if you can retrieve the
document from elasticsearch.

I'm almost sure it will work.

Le 18 décembre 2012 à 19:21, JP Toto james.p.toto@gmail.com a écrit :

Resetting the id won't dupe the subsequent docs in ES right?

On Tuesday, December 18, 2012 1:16:50 PM UTC-5, David Pilato wrote:

Yes. That's this field. Sounds like you are using a BigCouch
instance, don't you? If so, you handle right now more than 100 millions
changes, right?

In that case, you have to find the right change id given by bigcouch. I
think it's something like: xxxxxxx-BIGCOUCHADDON, where xxxxxxx is the real
seq number in couchDb.

David.

Le 18 décembre 2012 à 19:09, JP Toto < james....@gmail.com> a écrit :

> > > David,
Thanks for the reply! Do you mean the the last_seq field? Right now

that looks like the below: If I set it to a previous id, will it dupe any
documents that are after that? I don't mind overwriting documents but I
don't want to duplicate any.

Thanks for the help!

      * last_seq:

109019340-g1AAAAXpeJyV1DssQ1EYB_AbxGCXGCQGm0TjuUk0kXq03nxEDdKvtwlVvVLtym4zGMT7zdktFpsEs1liY_N-FMf_u1un5rvLf7m__3nce07KcZyq6VLXqXU57mUSQZebA_PeQnYulpltaGgMxFNezo2ls4F0IpvCyyUxhyuMOU2yE5qoU0lxhiuJRgT3Xukw3CjXWPsruL-mEDcVwXB_XGfMseDucd3IcCfcSjQkOPijw3DD3G5tXnCUdRjuh8PGHAoevSzEjUUw3BET0YC_29W6keEGecraL3_kvA7DfXPSmH3Bg8rvDHfAOaI-weEb3Zrh-nnJ2g_BkR3dyHCf6TJn15hlf-L1Kg65B4mIEK1KgRfVFvRCIt6s3ZaCtvvCgpaiBe-QiG0cNCkYftDOYAcS0UN0JgV9x9qCMCTixdoL_6CeawteIRGbxlxLQceKdg-2IBFdRLdSMKS7pCC7IRFP1t75S1jUFjxDItaNefT_w3JtwQYkIkT0JgVja6rbDrITEmHxSMHkTPIfOl4Kww
}

        On Tuesday, December 18, 2012 12:59:13 PM UTC-5, David Pilato

wrote:
> > > > What you can probably do is to modify
> > > > the meta data of the river. As far as I remember,
> > > > you will find the latest change number that
> > > > Elasticsearch process.

          You can update this particular document and set the field

to xxxx (1 if you want to start from the begining).

          Does it help?

          David.

          Le 18 décembre 2012 à 18:56, JP Toto <

james....@gmail.com> a écrit :

           > > > > > Hi all!
           Having an issue similar to

https://groups.google.com/forum/?fromgroups=#!searchin/elasticsearch/river$20missing/elasticsearch/Y8G4eu-sbrE/bNyopwUP_kgJ
https://groups.google.com/forum/?fromgroups=#!searchin/elasticsearch/river$20missing/elasticsearch/Y8G4eu-sbrE/bNyopwUP_kgJ

           We had a node failure which we repaired and brought

back up. ES re balanced itself and is all green (using 0.19.2). We're
using the CouchDB river which generally works well. However, it missed
some documents while we were down and now we have a gap that it
doesn't' seem to be filling in.

           Is there a way to have the _river re-scan _changes or

bootstrap it somehow? We're not sure how to get the gaps filled in.

           Thanks!



           --



            <https://groups.google.com/forum/?fromgroups=#!searchin/elasticsearch/river$20missing/elasticsearch/Y8G4eu-sbrE/bNyopwUP_kgJ>

           --
           David Pilato

           <https://groups.google.com/forum/?fromgroups=#!searchin/elasticsearch/river$20missing/elasticsearch/Y8G4eu-sbrE/bNyopwUP_kgJ>
           http://www.scrutmydocs.org/

http://www.scrutmydocs.org/
http://dev.david.pilato.fr/
http://dev.david.pilato.fr/
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

           --



          > > > > 
        --
        David Pilato
        http://www.scrutmydocs.org/ <http://www.scrutmydocs.org/>
        http://dev.david.pilato.fr/ <http://dev.david.pilato.fr/>
        Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

--

--
David Pilato
http://www.scrutmydocs.org/
http://dev.david.pilato.fr/
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

--

Thanks, David I really appreciate the suggestion. We're just trying to
figure out how to match update_seq to a document to figure out where to
start from.

On Tuesday, December 18, 2012 1:45:40 PM UTC-5, David Pilato wrote:

I think it will update the docs in elasticsearch because documents have
the same _id in bigcouch and elasticsearch.

You can check that by getting an id in BigCouch and do
curl localhost:9200/index/type/YOURBIGCOUCHID and see if you can retrieve
the document from elasticsearch.

I'm almost sure it will work.

Le 18 décembre 2012 à 19:21, JP Toto <james....@gmail.com <javascript:>>
a écrit :

Resetting the id won't dupe the subsequent docs in ES right?

On Tuesday, December 18, 2012 1:16:50 PM UTC-5, David Pilato wrote:

Yes. That's this field. Sounds like you are using a BigCouch instance,
don't you? If so, you handle right now more than 100 millions changes,
right?

In that case, you have to find the right change id given by bigcouch. I
think it's something like: xxxxxxx-BIGCOUCHADDON, where xxxxxxx is the real
seq number in couchDb.

David.

Le 18 décembre 2012 à 19:09, JP Toto < james....@gmail.com> a écrit :

David,

Thanks for the reply! Do you mean the the last_seq field? Right now that
looks like the below: If I set it to a previous id, will it dupe any
documents that are after that? I don't mind overwriting documents but I
don't want to duplicate any.

Thanks for the help!

  • last_seq: 109019340-g1AAAAXpeJyV1DssQ1EYB_AbxGCXGCQGm0TjuUk0kXq03nxEDdKvtwlVvVLtym4zGMT7zdktFpsEs1liY_N-FMf_u1un5rvLf7m__3nce07KcZyq6VLXqXU57mUSQZebA_PeQnYulpltaGgMxFNezo2ls4F0IpvCyyUxhyuMOU2yE5qoU0lxhiuJRgT3Xukw3CjXWPsruL-mEDcVwXB_XGfMseDucd3IcCfcSjQkOPijw3DD3G5tXnCUdRjuh8PGHAoevSzEjUUw3BET0YC_29W6keEGecraL3_kvA7DfXPSmH3Bg8rvDHfAOaI-weEb3Zrh-nnJ2g_BkR3dyHCf6TJn15hlf-L1Kg65B4mIEK1KgRfVFvRCIt6s3ZaCtvvCgpaiBe-QiG0cNCkYftDOYAcS0UN0JgV9x9qCMCTixdoL_6CeawteIRGbxlxLQceKdg-2IBFdRLdSMKS7pCC7IRFP1t75S1jUFjxDItaNefT_w3JtwQYkIkT0JgVja6rbDrITEmHxSMHkTPIfOl4Kww
    }

On Tuesday, December 18, 2012 12:59:13 PM UTC-5, David Pilato wrote:

What you can probably do is to modify the meta data of the river. As 

far as I remember, you will find the latest change number that
Elasticsearch process.
You can update this particular document and set the field to xxxx (1
if you want to start from the begining).

Does it help? 

David.

Le 18 décembre 2012 à 18:56, JP Toto < james....@gmail.com> a écrit :

Hi all!

Having an issue similar to  

https://groups.google.com/forum/?fromgroups=#!searchin/elasticsearch/river$20missing/elasticsearch/Y8G4eu-sbrE/bNyopwUP_kgJ

We had a node failure which we repaired and brought back up. ES re 

balanced itself and is all green (using 0.19.2). We're using the CouchDB
river which generally works well. However, it missed some documents while
we were down and now we have a gap that it doesn't' seem to be filling in.

Is there a way to have the _river re-scan _changes or bootstrap it 

somehow? We're not sure how to get the gaps filled in.

Thanks! 

--

https://groups.google.com/forum/?fromgroups=#!searchin/elasticsearch/river$20missing/elasticsearch/Y8G4eu-sbrE/bNyopwUP_kgJ

--
David Pilato

https://groups.google.com/forum/?fromgroups=#!searchin/elasticsearch/river$20missing/elasticsearch/Y8G4eu-sbrE/bNyopwUP_kgJ
http://www.scrutmydocs.org/
http://dev.david.pilato.fr/
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

--

--
David Pilato
http://www.scrutmydocs.org/
http://dev.david.pilato.fr/
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

--

--
David Pilato
http://www.scrutmydocs.org/
http://dev.david.pilato.fr/
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

--