Documents from couchdb river aren't deleted


(botay) #1

Hello,

After deleting a document in couchdb I got the following messages in
elastic search logfile. It looks like the document is deleted but it
remains in the index.

Has anybody an idea?

tom

[2012-03-19 16:41:44,352][TRACE][river.couchdb ] [Lorvex] [couchdb][cm8_river] [couchdb] {"seq":271963,"id":"2329616,de_DE","changes":[{"rev":"4-ae8660826e9706e0965039eb7365fbed"}],"deleted":true,"doc":{"_id":"2329616,de_DE","_rev":"4-ae8660826e9706e0965039eb7365fbed","_deleted":true}} [2012-03-19 16:41:44,353][TRACE][river.couchdb ] [Lorvex] [couchdb][cm8_river] processing [delete]: [cm8]/[cm1]/[2329616,de_DE] [2012-03-19 16:41:45,353][TRACE][river.couchdb ] [Lorvex] [couchdb][cm8_river] processing [_seq ]: [_river]/[cm8_river]/[_seq], last_seq [271963] [2012-03-19 16:41:45,354][TRACE][index.shard.service ] [Lorvex] [_river][0] index [Document indexed,omitNorms,indexOptions=DOCS_ONLY<_type:cm8_river> stored,indexed,tokenized,omitNorms<_uid:> indexed,tokenized indexed,tokenized<_all:>>] [2012-03-19 16:41:45,356][TRACE][index.shard.service ] [Lorvex] [cm8][0] delete [cm1#2329616,de_DE] [2012-03-19 16:41:46,109][TRACE][index.shard.service ] [Lorvex] [cm8][0] refresh with waitForOperations[false] [2012-03-19 16:41:46,211][TRACE][index.shard.service ] [Lorvex] [_river][0] refresh with waitForOperations[false]

Sorry for the duplicate post.


(Shay Banon) #2

How can you tell that the document is not deleted? It looks like it was in
elasticsearch based on the shard logging you enabled...

On Mon, Mar 19, 2012 at 6:11 PM, Tom Anheyer Tom.Anheyer@berlinonline.dewrote:

Hello,

After deleting a document in couchdb I got the following messages in
elastic search logfile. It looks like the document is deleted but it
remains in the index.

Has anybody an idea?

tom

[2012-03-19 16:41:44,352][TRACE][river.**couchdb ] [Lorvex] [couchdb][cm8_river] [couchdb] {"seq":271963,"id":"2329616,** de_DE","changes":[{"rev":"4-**ae8660826e9706e0965039eb7365fb** ed"}],"deleted":true,"doc":{"_**id":"2329616,de_DE","_rev":"4-** ae8660826e9706e0965039eb7365fb**ed","_deleted":true}} [2012-03-19 16:41:44,353][TRACE][river.**couchdb ] [Lorvex] [couchdb][cm8_river] processing [delete]: [cm8]/[cm1]/[2329616,de_DE] [2012-03-19 16:41:45,353][TRACE][river.**couchdb ] [Lorvex] [couchdb][cm8_river] processing [_seq ]: [_river]/[cm8_river]/[_seq], last_seq [271963] [2012-03-19 16:41:45,354][TRACE][index.**shard.service ] [Lorvex] [_river][0] index [Document indexed,omitNorms,**indexOptions=DOCS_ONLY<_type: **cm8_river> stored,indexed,tokenized,**omitNorms<_uid:> indexed,tokenized indexed,tokenized<_all:>>] [2012-03-19 16:41:45,356][TRACE][index.**shard.service ] [Lorvex] [cm8][0] delete [cm1#2329616,de_DE] [2012-03-19 16:41:46,109][TRACE][index.**shard.service ] [Lorvex] [cm8][0] refresh with waitForOperations[false] [2012-03-19 16:41:46,211][TRACE][index.**shard.service ] [Lorvex] [_river][0] refresh with waitForOperations[false]

Sorry for the duplicate post.


(botay) #3

Hello,

It looks like a problem with the _type attribute. Deleting the entry
using the the correct type works:

curl -X DELETE http://localhost:9200/cm8/article/2329616,de_DE

{"ok":true,"found":true,"_index":"cm8","_type":"article","_id":"2329616,de_DE","_version":14}

Logfile shows:

[2012-03-20 11:59:34,937][TRACE][index.shard.service ] [Kiber the
Cruel] [cm8][0] delete [article#2329616,de_DE]
[2012-03-20 11:59:35,022][TRACE][index.shard.service ] [Kiber the
Cruel] [cm8][0] refresh with waitForOperations[false]

In contrast the river results in the lines:

[2012-03-20 11:57:04,557][TRACE][index.shard.service ] [Kiber the
Cruel] [cm8][0] delete [cm1#2329616,de_DE]
[2012-03-20 11:57:05,003][TRACE][index.shard.service ] [Kiber the
Cruel] [cm8][0] refresh with waitForOperations[false]

cm1 is not the _type but the name of the couchdb.

tom

Am 20.03.2012 11:49, schrieb Shay Banon:

How can you tell that the document is not deleted? It looks like it
was in elasticsearch based on the shard logging you enabled...

On Mon, Mar 19, 2012 at 6:11 PM, Tom Anheyer
<Tom.Anheyer@berlinonline.de mailto:Tom.Anheyer@berlinonline.de> wrote:

Hello,

After deleting a document in couchdb I got the following messages
in elastic search logfile. It looks like the document is deleted
but it remains in the index.

Has anybody an idea?

tom

<raw>
[2012-03-19 16:41:44,352][TRACE][river.couchdb            ]
[Lorvex] [couchdb][cm8_river] [couchdb]
{"seq":271963,"id":"2329616,de_DE","changes":[{"rev":"4-ae8660826e9706e0965039eb7365fbed"}],"deleted":true,"doc":{"_id":"2329616,de_DE","_rev":"4-ae8660826e9706e0965039eb7365fbed","_deleted":true}}
[2012-03-19 16:41:44,353][TRACE][river.couchdb            ]
[Lorvex] [couchdb][cm8_river] processing [delete]:
[cm8]/[cm1]/[2329616,de_DE]
[2012-03-19 16:41:45,353][TRACE][river.couchdb            ]
[Lorvex] [couchdb][cm8_river] processing [_seq  ]:
[_river]/[cm8_river]/[_seq], last_seq [271963]
[2012-03-19 16:41:45,354][TRACE][index.shard.service      ]
[Lorvex] [_river][0] index
[Document<stored,binary,omitNorms,indexOptions=DOCS_ONLY<_source:[B@1262f7c>
indexed,omitNorms,indexOptions=DOCS_ONLY<_type:cm8_river>
stored,indexed,tokenized,omitNorms<_uid:>
indexed,tokenized<couchdb.last_seq:271963> indexed,tokenized<_all:>>]
[2012-03-19 16:41:45,356][TRACE][index.shard.service      ]
[Lorvex] [cm8][0] delete [cm1#2329616,de_DE]
[2012-03-19 16:41:46,109][TRACE][index.shard.service      ]
[Lorvex] [cm8][0] refresh with waitForOperations[false]
[2012-03-19 16:41:46,211][TRACE][index.shard.service      ]
[Lorvex] [_river][0] refresh with waitForOperations[false]
</raw>

Sorry for the duplicate post.

--
Mit freundlichen Grüßen
Tom Anheyer
Entwicklung& Technik

Karl-Liebknecht-Straße 29 | 10178 Berlin | Germany

Tel.: +49 (30) 23 27 - 52 10
Fax: +49 (30) 23 27 - 55 96
E-Mail: tom.anheyer@berlinonline.de

Berlin.de | BerlinOnline.de | B2B-Deutschland.de | Deutschland-Reise.de | VisitBerlin.de

Amtsgericht Berlin-Charlottenburg, HRA 31951
Sitz der Gesellschaft: Berlin, Deutschland
Geschäftsführer Olf Dziadek
USt.-IdNr: DE219483549

persönlich haftender Gesellschafter:
BerlinOnline Stadtportalbeteiligungsgesellschaft mbH
Amtsgericht Berlin-Charlottenburg, HRB 79077
Sitz der Gesellschaft: Berlin, Deutschland
Geschäftsführer Olf Dziadek


(botay) #4

Hello again,

Is there any further help for me? It's a very urgent problem for my system.

Some more information:

The type is not part of the id. Deleting an entry only using the id is a valid operation in my system. The type is dynamically build by a river script. For items to delete the river delivers no document information and I have no chance to get the right type of the item.

Please help one more.

best regards
tom

Hello,

It looks like a problem with the _type attribute. Deleting the entry
using the the correct type works:

curl -X DELETE http://localhost:9200/cm8/article/2329616,de_DE

{"ok":true,"found":true,"_index":"cm8","_type":"article","_id":"2329616,de_DE","_version":14}

Logfile shows:

[2012-03-20 11:59:34,937][TRACE][index.shard.service ] [Kiber the Cruel] [cm8][0] delete [article#2329616,de_DE]
[2012-03-20 11:59:35,022][TRACE][index.shard.service ] [Kiber the Cruel] [cm8][0] refresh with waitForOperations[false]

In contrast the river results in the lines:

[2012-03-20 11:57:04,557][TRACE][index.shard.service ] [Kiber the Cruel] [cm8][0] delete [cm1#2329616,de_DE]
[2012-03-20 11:57:05,003][TRACE][index.shard.service ] [Kiber the Cruel] [cm8][0] refresh with waitForOperations[false]

cm1 is not the _type but the name of the couchdb.

tom


(Shay Banon) #5

In this case its a problem, since the type is needed in order to delete the
doc using the delete API. The type used in this case (if you can't extract
it from the id of the doc you get from couch) is the one you created the
river with...

On Fri, Mar 23, 2012 at 1:52 PM, botay tom.anheyer@berlinonline.de wrote:

Hello again,

Is there any further help for me? It's a very urgent problem for my system.

Some more information:

The type is not part of the id. Deleting an entry only using the id is a
valid operation in my system. The type is dynamically build by a river
script. For items to delete the river delivers no document information and
I
have no chance to get the right type of the item.

Please help one more.

best regards
tom

botay wrote

Hello,

It looks like a problem with the _type attribute. Deleting the entry
using the the correct type works:

curl -X DELETE http://localhost:9200/cm8/article/2329616,de_DE

{"ok":true,"found":true,"_index":"cm8","_type":"article","_id":"2329616,de_DE","_version":14}

Logfile shows:

[2012-03-20 11:59:34,937][TRACE][index.shard.service ] [Kiber the
Cruel] [cm8][0] delete [article#2329616,de_DE]
[2012-03-20 11:59:35,022][TRACE][index.shard.service ] [Kiber the
Cruel] [cm8][0] refresh with waitForOperations[false]

In contrast the river results in the lines:

[2012-03-20 11:57:04,557][TRACE][index.shard.service ] [Kiber the
Cruel] [cm8][0] delete [cm1#2329616,de_DE]
[2012-03-20 11:57:05,003][TRACE][index.shard.service ] [Kiber the
Cruel] [cm8][0] refresh with waitForOperations[false]

cm1 is not the _type but the name of the couchdb.

tom

--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/Documents-from-couchdb-river-aren-t-deleted-tp3839436p3851324.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.


(botay) #6

hi,

ok, i'll try to modify the couchdb river code to modify the delete code to delete an id for all possible types. Are there any tips to setup the build environment?

best regards
tom

Shay Banon kimchy@gmail.com schrieb:

In this case its a problem, since the type is needed in order to delete the doc using the delete API. The type used in this case (if you can't extract it from the id of the doc you get from couch) is the one you created the river with...

On Fri, Mar 23, 2012 at 1:52 PM, botay <tom.anheyer@berlinonline.demailto:tom.anheyer@berlinonline.de> wrote:
Hello again,

Is there any further help for me? It's a very urgent problem for my system.

Some more information:

The type is not part of the id. Deleting an entry only using the id is a
valid operation in my system. The type is dynamically build by a river
script. For items to delete the river delivers no document information and I
have no chance to get the right type of the item.

Please help one more.

best regards
tom

botay wrote

Hello,

It looks like a problem with the _type attribute. Deleting the entry
using the the correct type works:

curl -X DELETE http://localhost:9200/cm8/article/2329616,de_DE

{"ok":true,"found":true,"_index":"cm8","_type":"article","_id":"2329616,de_DE","_version":14}

Logfile shows:

[2012-03-20 11:59:34,937][TRACE][index.shard.service ] [Kiber the
Cruel] [cm8][0] delete [article#2329616,de_DE]
[2012-03-20 11:59:35,022][TRACE][index.shard.service ] [Kiber the
Cruel] [cm8][0] refresh with waitForOperations[false]

In contrast the river results in the lines:

[2012-03-20 11:57:04,557][TRACE][index.shard.service ] [Kiber the
Cruel] [cm8][0] delete [cm1#2329616,de_DE]
[2012-03-20 11:57:05,003][TRACE][index.shard.service ] [Kiber the
Cruel] [cm8][0] refresh with waitForOperations[false]

cm1 is not the _type but the name of the couchdb.

tom

--
View this message in context: http://elasticsearch-users.115913.n3.nabble.com/Documents-from-couchdb-river-aren-t-deleted-tp3839436p3851324.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.


(David Pilato) #7

Heya,

Not sure that this change will fit to every use case.
I suggest to make it as an option in the couchDb river metadata.

HTH
David :wink:
Twitter : @dadoonet / @elasticsearchfr

Le 25 mars 2012 à 15:55, "Anheyer, Tom" Tom.Anheyer@berlinonline.de a écrit :

hi,

ok, i'll try to modify the couchdb river code to modify the delete code to delete an id for all possible types. Are there any tips to setup the build environment?

best regards
tom

Shay Banon kimchy@gmail.com schrieb:

In this case its a problem, since the type is needed in order to delete the doc using the delete API. The type used in this case (if you can't extract it from the id of the doc you get from couch) is the one you created the river with...

On Fri, Mar 23, 2012 at 1:52 PM, botay tom.anheyer@berlinonline.de wrote:
Hello again,

Is there any further help for me? It's a very urgent problem for my system.

Some more information:

The type is not part of the id. Deleting an entry only using the id is a
valid operation in my system. The type is dynamically build by a river
script. For items to delete the river delivers no document information and I
have no chance to get the right type of the item.

Please help one more.

best regards
tom

botay wrote

Hello,

It looks like a problem with the _type attribute. Deleting the entry
using the the correct type works:

curl -X DELETE http://localhost:9200/cm8/article/2329616,de_DE

{"ok":true,"found":true,"_index":"cm8","_type":"article","_id":"2329616,de_DE","_version":14}

Logfile shows:

[2012-03-20 11:59:34,937][TRACE][index.shard.service ] [Kiber the
Cruel] [cm8][0] delete [article#2329616,de_DE]
[2012-03-20 11:59:35,022][TRACE][index.shard.service ] [Kiber the
Cruel] [cm8][0] refresh with waitForOperations[false]

In contrast the river results in the lines:

[2012-03-20 11:57:04,557][TRACE][index.shard.service ] [Kiber the
Cruel] [cm8][0] delete [cm1#2329616,de_DE]
[2012-03-20 11:57:05,003][TRACE][index.shard.service ] [Kiber the
Cruel] [cm8][0] refresh with waitForOperations[false]

cm1 is not the _type but the name of the couchdb.

tom

--
View this message in context: http://elasticsearch-users.115913.n3.nabble.com/Documents-from-couchdb-river-aren-t-deleted-tp3839436p3851324.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.


(botay) #8

Hi,

here is my solution. The river accepts a CSV line as input. In my case 'article,gallery,SHOFI,SHOFICAT'. This type is set in the river script if a delete happens. The river loops over the list of given types and tries to delete the given id for the current type.

*** src/main/java/org/elasticsearch/river/couchdb/CouchdbRiver.java.orig 2012-03-26 16:14:30.597628315 +0200
--- src/main/java/org/elasticsearch/river/couchdb/CouchdbRiver.java 2012-03-26 13:01:08.450282666 +0200


*** 247,256 ****
} else if (ctx.containsKey("deleted") && ctx.get("deleted").equals(Boolean.TRUE)) {
String index = extractIndex(ctx);
String type = extractType(ctx);
! if (logger.isTraceEnabled()) {
! logger.trace("processing [delete]: [{}]/[{}]/[{}]", index, type, id);
}

  •         bulk.add(deleteRequest(index).type(type).id(id).routing(extractRouting(ctx)).parent(extractParent(ctx)));
        } else if (ctx.containsKey("doc")) {
            String index = extractIndex(ctx);
            String type = extractType(ctx);
    

--- 247,268 ----
} else if (ctx.containsKey("deleted") && ctx.get("deleted").equals(Boolean.TRUE)) {
String index = extractIndex(ctx);
String type = extractType(ctx);
! if (type.contains(",")) {
! String[] types = type.split(",");
! for (int it=types.length - 1; it >= 0; it--)
! {
! if (logger.isTraceEnabled()) {
! logger.trace("processing [delete]: [{}]/[{}]/[{}]", index, types[it], id);
! }
! bulk.add(deleteRequest(index).type(types[it]).id(id).routing(extractRouting(ctx)).parent(extractParent(ctx)));
! }
! }
! else {
! if (logger.isTraceEnabled()) {
! logger.trace("processing [delete]: [{}]/[{}]/[{}]", index, type, id);
! }
! bulk.add(deleteRequest(index).type(type).id(id).routing(extractRouting(ctx)).parent(extractParent(ctx)));
}
} else if (ctx.containsKey("doc")) {
String index = extractIndex(ctx);
String type = extractType(ctx);

best regards
tom

Heya,

Not sure that this change will fit to every use case.
I suggest to make it as an option in the couchDb river metadata.

HTH
David :wink:
Twitter : @dadoonet / @elasticsearchfr

Le 25 mars 2012 à 15:55, "Anheyer, Tom" <Tom.Anheyer@> a écrit :

hi,

ok, i'll try to modify the couchdb river code to modify the delete code to delete an id for all possible types. Are there any tips to setup the build environment?

best regards
tom

Shay Banon <kimchy@> schrieb:

In this case its a problem, since the type is needed in order to delete the doc using the delete API. The type used in this case (if you can't extract it from the id of the doc you get from couch) is the one you created the river with...

On Fri, Mar 23, 2012 at 1:52 PM, botay <tom.anheyer@> wrote:
Hello again,

Is there any further help for me? It's a very urgent problem for my system.

Some more information:

The type is not part of the id. Deleting an entry only using the id is a
valid operation in my system. The type is dynamically build by a river
script. For items to delete the river delivers no document information and I
have no chance to get the right type of the item.

Please help one more.

best regards
tom

botay wrote

Hello,

It looks like a problem with the _type attribute. Deleting the entry
using the the correct type works:

curl -X DELETE http://localhost:9200/cm8/article/2329616,de_DE

{"ok":true,"found":true,"_index":"cm8","_type":"article","_id":"2329616,de_DE","_version":14}

Logfile shows:

[2012-03-20 11:59:34,937][TRACE][index.shard.service ] [Kiber the
Cruel] [cm8][0] delete [article#2329616,de_DE]
[2012-03-20 11:59:35,022][TRACE][index.shard.service ] [Kiber the
Cruel] [cm8][0] refresh with waitForOperations[false]

In contrast the river results in the lines:

[2012-03-20 11:57:04,557][TRACE][index.shard.service ] [Kiber the
Cruel] [cm8][0] delete [cm1#2329616,de_DE]
[2012-03-20 11:57:05,003][TRACE][index.shard.service ] [Kiber the
Cruel] [cm8][0] refresh with waitForOperations[false]

cm1 is not the _type but the name of the couchdb.

tom

--
View this message in context: http://elasticsearch-users.115913.n3.nabble.com/Documents-from-couchdb-river-aren-t-deleted-tp3839436p3851324.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.


(system) #9