More than one document with the same _id

Hello.

I have encountered strange issue on ES 5.6.4 and Java 8. Multiple documents with the same _id and some with missing _version are result when using scripted_upsert to insert and merge multiple events into single document. All documents are indexed into same index and shard on the same node.

I am not sure how is this possible and any hint how to troubleshoot is appreciated. Bellow is the example of query and result.

GET /_search
{
  "version": true, 
  "_source": ["DatabaseId", "SequenceId"], 
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "DatabaseId": 1202
          }
        },
        {
          "match": {
            "SequenceId": 239761506
          }
        }
      ]
    }
  }
}

{
  "took": 111,
  "timed_out": false,
  "_shards": {
    "total": 48,
    "successful": 48,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 4,
    "max_score": 2,
    "hits": [
      {
        "_index": "some-index",
        "_type": "message",
        "_id": "1202-239761506",
        "_version": 1,
        "_score": 2,
        "_source": {
          "DatabaseId": 1202,
          "SequenceId": 239761506
        }
      },
      {
        "_index": "some-index",
        "_type": "message",
        "_id": "1202-239761506",
        "_version": 1,
        "_score": 2,
        "_source": {
          "DatabaseId": 1202,
          "SequenceId": 239761506
        }
      },
      {
        "_index": "some-index",
        "_type": "message",
        "_id": "1202-239761506",
        "_score": 2,
        "_source": {
          "DatabaseId": 1202,
          "SequenceId": 239761506
        }
      },
      {
        "_index": "some-index",
        "_type": "message",
        "_id": "1202-239761506",
        "_version": 1,
        "_score": 2,
        "_source": {
          "DatabaseId": 1202,
          "SequenceId": 239761506
        }
      }
    ]
  }
}

Best regards, Zvonimir

How many shards does the some-index index have? Have you used routing while you have been indexing and/or updating? What result do you get if you run the query and use the document id seen here as routing key?

Index has 48 shards and we are not using custom routing. When searching by _id, ES returns only single document. Result is the last and correct version of that document.

Also when trying to delete older versions I get following error. Looks like older version were not remove correctly from the shard.

{
  "error": {
    "root_cause": [
      {
        "type": "action_request_validation_exception",
        "reason": "Validation Failed: 1: illegal version value [-1] for version type [INTERNAL];"
      }
    ],
    "type": "action_request_validation_exception",
    "reason": "Validation Failed: 1: illegal version value [-1] for version type [INTERNAL];"
  },
  "status": 400
}

Did you run the search that returned multiple documents with the document id as routing key?

I am not sure what you mean. This query returns only the latest version of document.

GET some-index/message/313-10354627132

It appears that root cause of the issue is that older version wasn't properly deleted when document was updated with newer version.

That is a GET request that is automatically routed to a single shard. What does the following give?

GET /_search?routing=1202-239761506
{
  "version": true, 
  "_source": ["DatabaseId", "SequenceId"], 
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "DatabaseId": 1202
          }
        },
        {
          "match": {
            "SequenceId": 239761506
          }
        }
      ]
    }
  }
}

It returns 4 documents like in my first post.

Have never seen that before so will have to leave that for someone else. What does your cluster configuration and hardware look like?

Cluster has 3 dedicated master nodes and 48 data nodes. Indexes have 48 shards and there are ~250 shards per node. Indexing is done in bulk with ~25k/sec rate and it is using painless scripted_upsert.

When trying to reindex those documents to a new index I get following.

POST /_reindex
{
   "source":{
      "index":"some-index",
      "query":{
         "bool":{
            "must":[
               {
                  "match":{
                     "DatabaseId":1202
                  }
               },
               {
                  "match":{
                     "SequenceId":239761506
                  }
               }
            ]
         }
      }
   },
   "dest":{
      "index":"broken-docs"
   }
}

{
  "took": 520,
  "timed_out": false,
  "total": 4,
  "updated": 3,
  "created": 1,
  "deleted": 0,
  "batches": 1,
  "version_conflicts": 0,
  "noops": 0,
  "retries": {
    "bulk": 0,
    "search": 0
  },
  "throttled_millis": 0,
  "requests_per_second": -1,
  "throttled_until_millis": 0,
  "failures": []
}

GET broken-docs/_count

{
  "count": 1,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  }
}

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.