Duplicate _id/_type within the same index


(Max M.) #1

We apparently ended up with two documents sharing the same _id/_type pair
within the same index. Here is a _search query's output:

GET
"http://192.168.72.55:19301/kqi.docubank.document.documatica_checker.tdoc_checker/docs/_search?q=piieaimobdcckbghkadffhdlbmkmnpcd&pretty=1&fields="
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 2,
"max_score" : 0.1875,
"hits" : [ {
"_index" : "kqi.docubank.document.documatica_checker.tdoc_checker",
"_type" : "docs",
"_id" : "piieaimobdcckbghkadffhdlbmkmnpcd",
"_score" : 0.1875
}, {
"_index" : "kqi.docubank.document.documatica_checker.tdoc_checker",
"_type" : "docs",
"_id" : "piieaimobdcckbghkadffhdlbmkmnpcd",
"_score" : 0.057534903
} ]
}
}

It was my understanding that this situation should not be possibile.
Currently, all updates affect only one of the two documents.
We're using version 0.17.8 in a cluster of two instances. Indeces were
initially created with version 0.11. The first copy of the document was
indexed before migrating to 0.17.8.

Is there anything I could investigate on to undestand what might have
happened?

Max


(Shay Banon) #2

Are you using custom routing or parent when indexing? You can ask for
fields=_routing to get the routing value back as well.

On Wed, Nov 23, 2011 at 3:37 PM, Max M. minimal.oasis@gmail.com wrote:

We apparently ended up with two documents sharing the same _id/_type pair
within the same index. Here is a _search query's output:

GET "
http://192.168.72.55:19301/kqi.docubank.document.documatica_checker.tdoc_checker/docs/_search?q=piieaimobdcckbghkadffhdlbmkmnpcd&pretty=1&fields=
"
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 2,
"max_score" : 0.1875,
"hits" : [ {
"_index" : "kqi.docubank.document.documatica_checker.tdoc_checker",
"_type" : "docs",
"_id" : "piieaimobdcckbghkadffhdlbmkmnpcd",
"_score" : 0.1875
}, {
"_index" : "kqi.docubank.document.documatica_checker.tdoc_checker",
"_type" : "docs",
"_id" : "piieaimobdcckbghkadffhdlbmkmnpcd",
"_score" : 0.057534903
} ]
}
}

It was my understanding that this situation should not be possibile.
Currently, all updates affect only one of the two documents.
We're using version 0.17.8 in a cluster of two instances. Indeces were
initially created with version 0.11. The first copy of the document was
indexed before migrating to 0.17.8.

Is there anything I could investigate on to undestand what might have
happened?

Max


(Max M.) #3

On Wednesday, November 23, 2011 4:10:30 PM UTC+1, kimchy wrote:

Are you using custom routing or parent when indexing? You can ask for
fields=_routing to get the routing value back as well.

I have never provided any routing*.* Adding fields=_routing doesn't have
any effect, that is: the query doesn't return any extra fields.

I also noticed another weird behavior. Our documents have a field called
"indexing_time". If I add it to the "fields" attribute, and repeat the same
query a few times, that field is not always included in the response. In
fact, it is included every other times. (It must have to do with our
cluster comprising two servers.)

Max


(Shay Banon) #4

Strange..., can you provide a recreation? See
http://www.elasticsearch.org/help.

On Thu, Nov 24, 2011 at 12:29 PM, Max M. minimal.oasis@gmail.com wrote:

On Wednesday, November 23, 2011 4:10:30 PM UTC+1, kimchy wrote:

Are you using custom routing or parent when indexing? You can ask for
fields=_routing to get the routing value back as well.

I have never provided any routing*.* Adding fields=_routing doesn't
have any effect, that is: the query doesn't return any extra fields.

I also noticed another weird behavior. Our documents have a field called
"indexing_time". If I add it to the "fields" attribute, and repeat the same
query a few times, that field is not always included in the response. In
fact, it is included every other times. (It must have to do with our
cluster comprising two servers.)

Max


(Max M.) #5

On Thursday, November 24, 2011 2:53:05 PM UTC+1, kimchy wrote:

Strange..., can you provide a recreation? See
http://www.elasticsearch.org/help.

All attempts to recreate the problem have failed. Actually, documents are
very often reindexed, but the problem at issue seems to only
have occurred with that document. Replaying all operations on a fresh index
did not yield unexpected results.
I can, though, provide you with all the data files and the configuration.
It's just a few KB and does not include private information.

data/config: esdata.zip (http://goo.gl/eABZw)
_index: kqi.docubank.document.documatica_checker.tdoc_checker
_type: docs
_id: piieaimobdcckbghkadffhdlbmkmnpcd

Max


(system) #6