Duplicate document with same _id and _type


(dchancogne) #1

We are running in the same issue as in this post, unfortunately the thread
doesn't really have any answer :frowning: :
http://elasticsearch-users.115913.n3.nabble.com/duplicate-id-type-within-the-same-index-td3530682.html

Running a search against a single index for a 'post' with a given id we get
2 hits. We use _parent and use a
'userId' for the parent. The following result happens after trying reassign
the post to a different user and
reindex the post. Is _parent somehow used in the _uid?

{

  • took: 6
  • timed_out: false
  • _shards: {
    • total: 5
    • successful: 5
    • failed: 0
      }
  • hits: {
    • total: 2
    • max_score: 1
    • hits: [
      • {
        • _index: traackr_alt
        • _type: post
        • _id: 8fc84a5ce16436b2910276c109a439ed
        • _score: 1
        • fields: {
          • channelId: a20fed5f67dd3f3599d8a85bbe7b65de
          • _routing: 1f27ff80a17f3f4fb6b37e44a3bab059
          • _parent: 1f27ff80a17f3f4fb6b37e44a3bab059
          • userId: 1f27ff80a17f3f4fb6b37e44a3bab059
            }
            }
      • {
        • _index: traackr_alt
        • _type: post
        • _id: 8fc84a5ce16436b2910276c109a439ed
        • _score: 1
        • fields: {
          • channelId: 04cca4500ebf3322b4ca31bde245b6c1
          • _routing: 9b932eb9528d39a3854b556b470d9579
          • _parent: 9b932eb9528d39a3854b556b470d9579
          • userId: 9b932eb9528d39a3854b556b470d9579
            }
            }
            ]
            }

}


(Clinton Gormley) #2

On Fri, 2012-08-03 at 09:13 -0700, David Chancogne wrote:

We are running in the same issue as in this post, unfortunately the
thread doesn't really have any answer :frowning: :
http://elasticsearch-users.115913.n3.nabble.com/duplicate-id-type-within-the-same-index-td3530682.html

Running a search against a single index for a 'post' with a given id
we get 2 hits. We use _parent and use a
'userId' for the parent. The following result happens after trying
reassign the post to a different user and
reindex the post. Is _parent somehow used in the _uid?

A unique ID really depends on index/type/id/routing. The routing, by
default, is derived from the ID. But if you use parent-child
relationships, it is derived from the parent.

So if you change the parent and reindex, it may or may not end up on the
same shard. If it is on a different shard, then the old doc will not be
removed.

So if you're going to change the parent, you should delete the old doc
before indexing the new doc

clint


(dchancogne) #3

On Friday, August 3, 2012 12:23:30 PM UTC-4, Clinton Gormley wrote:

On Fri, 2012-08-03 at 09:13 -0700, David Chancogne wrote:

We are running in the same issue as in this post, unfortunately the
thread doesn't really have any answer :frowning: :

http://elasticsearch-users.115913.n3.nabble.com/duplicate-id-type-within-the-same-index-td3530682.html

Running a search against a single index for a 'post' with a given id
we get 2 hits. We use _parent and use a
'userId' for the parent. The following result happens after trying
reassign the post to a different user and
reindex the post. Is _parent somehow used in the _uid?

A unique ID really depends on index/type/id/routing. The routing, by
default, is derived from the ID. But if you use parent-child
relationships, it is derived from the parent.

So if you change the parent and reindex, it may or may not end up on the
same shard. If it is on a different shard, then the old doc will not be
removed.

So if you're going to change the parent, you should delete the old doc
before indexing the new doc

Thank you Clint. That makes sense. Just wish the doc on _uid would
mention that _routing is derived from _parent when present:
http://www.elasticsearch.org/guide/reference/mapping/uid-field.html

Thanks again.


(system) #4