Reindex does not create all documents

Can anyone explain what is going on here? Why is the "created" count lower than "total"?

This is on ElasticSearch 5.4.0.

{
"completed": true,
"task": {
"node": "6XQAqH3qQYqJq9MqCgpeLA",
"id": 160658656,
"type": "transport",
"action": "indices:data/write/reindex",
"status": {
"total": 1268925,
"updated": 0,
"created": 1267357,
"deleted": 0,
"batches": 1268,
"version_conflicts": 0,
"noops": 0,
"retries": {
"bulk": 0,
"search": 0
},
"throttled_millis": 0,
"requests_per_second": -1,
"throttled_until_millis": 0
},
"description": "reindex from [ipg.ipg_fc_timing_indicator.pipeline.05810d74-0a06-46f4-b9eb-05f79f10a09d-latest] to [temp-ipg.ipg_fc_timing_indicator.pipeline.05810d74-0a06-46f4-b9eb-05f79f10a09d-latest]",
"start_time_in_millis": 1519549530514,
"running_time_in_nanos": 118026405862,
"cancellable": true
},
"response": {
"took": 118026,
"timed_out": false,
"total": 1268925,
"updated": 0,
"created": 1267357,
"deleted": 0,
"batches": 1268,
"version_conflicts": 0,
"noops": 0,
"retries": {
"bulk": 0,
"search": 0
},
"throttled_millis": 0,
"requests_per_second": -1,
"throttled_until_millis": 0,
"failures": []
}
}

Anyone? :expressionless:

I'm really not sure! The total is the total from the search responses for the scroll. The created, updated, deleted, and noops are counted based on how we process the results. I'm certainly curious which documents are missing. Can you check it? You'd have to write a program to walk the two indices in the same order and compare. What was the _reindex request? Did you use ingest pipelines? It feels like a bug though.

This is actually a continuation of this issue:

It happened again, on a different cluster. Then, when I tried to reindex the data from the "faulty" index, I noticed that reindex itself created less documents than the "total" of the original index reported, so I don't think it is possible to retrieve the "missing" documents, as Elasticsearch simply doesn't return them - only counts them in the "total" for search/scroll requests.

The index request was a straight-forward reindex, with only source and dest specified, no pipelines were used.

I've reached out to some folks who know more about the inner workings of scroll then I do to have a look at this.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.