Losing documents while Reindex


#1
  • Elasticsearch: 6.2

Hello.

I am trying to update the mapping of an existing index, by the operations described below.
This operation works well for the index if number of documents are relatively small (e.g. 1,000).
However, if the index has many (e.g. 100,000) documents, 50~90% of documents are lost...

Any help?


Step 1. Update Index template

PUT _template/twitter -d @template_twitter.json

Step 2. Create new temporary index from existing index.

POST _reindex?wait_for_completion=true
{
  "source": {
    "index": "twitter"
  },
  "dest": {
    "index": "twitter_temp"
  }
}

Step 3. Delete old index

DELETE twitter

Step 4. Restore from temporary index.

POST _reindex?wait_for_compretion=true
{
  "source": {
    "index": "twitter_temp"
  },
  "dest": {
    "index": "twitter"
  }
}

Step 5. Delete temporary index.

DELETE twitter_temp

After Step 2,

GET twitter/_stats

and

GET twitter_temp/_stats

returns JSON in which _all.total.docs.count field are different.

So I guess that I miss something in Step 2 so data losing happens.


(Christian Dahlqvist) #2

Should this not be wait_for_completion? Is this a copy and paste issue or what you have actually been running?


(David Pilato) #3

Also could you run _search?size=0 in both cases instead the index stats API?


#4

Should this not be wait_for_completion ?

This issue happens both if wait_for_competion exists or not.
Actually, I added wait_for_completion hoping to resolve the issue, but the issue remains.

Is this a copy and paste issue or what you have actually been running?
I don't understand what copy and pase issue means.

This is actual issue I am facing in my operation.
As mentioned, this steps works for smaller index.
The index name twitter is dummy, though.


#5

Below is a result of _search?size=0 for original index.

{
  "took": 0,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 24007,
    "max_score": 0,
    "hits": []
  }
}

Below is a result of _search?size=0 for temp index.

{
  "took": 0,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 16294,
    "max_score": 0,
    "hits": []
  }
}

Count for original was 24007, but temp was 16294.

BTW, a result of _reindex (step 2) was below.
It looks like 24007 created in temp successfully, but only 16294 returned for search.

{
  "took": 3656,
  "timed_out": false,
  "total": 24007,
  "updated": 0,
  "created": 24007,
  "deleted": 0,
  "batches": 25,
  "version_conflicts": 0,
  "noops": 0,
  "retries": {
    "bulk": 0,
    "search": 0
  },
  "throttled_millis": 0,
  "requests_per_second": -1,
  "throttled_until_millis": 0,
  "failures": []
}

(Jun Ohtani) #6

Could you run below query to original index?

/_search
{
  "size": 0,
  "aggs":{
    "fuga": {
      "cardinality": {
        "field": "_id",
        "precision_threshold": 25000
      }
    }
  }
}

#7

Actually "hits" have more documents but omitted, since the forum has 7,000 characters limitation in one post.

{
  "took": 5,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 24007,
    "max_score": 1,
    "hits": [
      {
        "_index": "dev.annotation_v2",
        "_type": "annotation_type",
        "_id": "[\"410f580a-6183-4a4c-94fd-5083cfcd971b\",\"test_2\",\"dafbdf3d-dea6-45d9-8775-1ac89a603c0d\",\"cf031890-c155-4767-af3b-3b2c26b54a37\"]",
        "_score": 1,
        "_source": {
          "project_id": "410f580a-6183-4a4c-94fd-5083cfcd971b",
          "task_id": "test_2",
          "input_data_id": "dafbdf3d-dea6-45d9-8775-1ac89a603c0d",
          "detail": {
            "annotation_id": "cf031890-c155-4767-af3b-3b2c26b54a37",
            "account_id": "john_doe",
            "label_id": "a07c68fb-b8cd-499a-817f-c2bbb83007d2",
            "is_protected": false,
            "data_holding_type": "inner",
            "data": "70,103,165,138",
            "path": null,
            "etag": null,
            "additional_data_list": [
              {
                "additional_data_definition_id": "2d6b7295-6376-4887-ba1d-81bc43ac0de4",
                "flag": null,
                "integer": null,
                "comment": null,
                "choice": null
              }
            ],
            "created_datetime": "2018-11-12T13:32:14.061+09:00",
            "updated_datetime": "2018-11-12T13:32:14.061+09:00"
          },
          "updated_datetime": "2018-11-12T13:32:14.001+09:00",
          "@timestamp": "2018-11-12T04:32:18.300Z"
        }
      }
    ]
  }
}

(David Pilato) #8

This id is weird

 "_id": "[\"410f580a-6183-4a4c-94fd-5083cfcd971b\",\"test_2\",\"dafbdf3d-dea6-45d9-8775-1ac89a603c0d\",\"cf031890-c155-4767-af3b-3b2c26b54a37\"]",

Could you check if it does exist in the original index?


#9

Could you check if it does exist in the original index?

Both original and temp index has the document of this _id.

My app stores compound-key-ish value to _id, such as ["a","b","c","d"].


(David Pilato) #10

I'm wondering if this kind of _id could be a source of a bug somewhat...

@nik9000 does this ring a bell?


(Nik Everett) #11

I've never tried to use an id like this! It makes sense and I don't think it should be a problem.

This probably isn't it, but maybe add refresh=true to the _reindex just to make sure that the documents didn't make it. Reindex certainly thinks it wrote the right number of documents and I haven't seen it hit an issue with counting.


#12

Sorry for late response.

Unfortunately, adding ?refresh does not solve this issue.
However, I deleted twitter_temp completely and run _reindex, then there were no document loss.
(I kept the temp index to verify count matches with original index)

I assume my index are broken somehow...?

Anyway, I will close this issue.


(David Pilato) #13

That can indicate that you had a wrong mapping (no mapping so using default). And while reindexing some documents failed the reindex operation because of that.