Not all documents copied after reindex

daq · July 17, 2018, 12:42am

I modified a template for the index to convert a couple of variables from string to long.
Then I issued a reindex call:

POST _reindex
{
  "source": {
    "index": "index-old"
  },
  "dest": {
    "index": "index-new"
  }
}

No errors anywhere, but I'm missing about half the documents when the task completes:

get _cat/indices/index-*
health status index     uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   index-old tEdSeRmFQ7WhGFoWb4vYVw   6   1     993613            0      1.9gb        986.7mb
green  open   index-new 3KOMWY9lSyyASJS9TgPMEQ   6   1     532999            0        1gb        541.2mb

I've done this many times before and reindex is usually very reliable. How can I figure out what happened to the rest of the documents?

Thanks!

warkolm · July 17, 2018, 12:47am

Is there anything in your Elasticsearch logs that might explain it?

daq · July 17, 2018, 1:04am

Nothing. I was tailing the logs in real time on the server where this task was running. Not a single entry.

warkolm · July 17, 2018, 1:15am

Very odd, can you post the mappings from each index and a sample doc?

daq · July 17, 2018, 1:42am

Mappings are ~800 lines long. Do you want just the differences or the whole thing?

warkolm · July 17, 2018, 2:33am

What about putting it into gist/pastebin/etc?

zqc0512 · July 17, 2018, 3:43am

reindex not complete？
see the tasks show the information
/_cat/tasks?pretty&v

daq · July 17, 2018, 4:27am

Reindexing task definitely completes. I monitor it in the log files and with

GET _tasks?detailed=true&actions=*reindex

daq · July 17, 2018, 4:35am

Old mapping:
https://pastebin.com/fxnsHE27

New mapping:
https://pastebin.com/8dGyEgFe

Example doc:
https://pastebin.com/ZTQLqxTS

daq · July 18, 2018, 12:59am

Another index is now exhibiting identical problem. Copies about half the data and just stops without errors. This one is much larger so I don't think its a resource problem.

Any ideas?

warkolm · July 18, 2018, 1:09am

What version are you on?

daq · July 18, 2018, 2:39am

elasticsearch-6.2.3

daq · July 19, 2018, 12:39am

Couple of other things I noticed:
It doesn't always stop on the same document.
Out of total of 1151419 documents, first run quit on 634999 and second on 651999.

I was also looking at _nodes/stats and didn't see any significant differences in nodes running the reindex tasks and those that weren't.

daq · July 20, 2018, 4:35pm

I tried reindexing into a different name to make sure errors in the template aren't affecting reindexing and the same problem exists. Only about half the data got copied.

Does anyone have any other suggestions for troubleshooting/debugging this?

Thanks!

warkolm · July 20, 2018, 9:30pm

It might be worth raising an issue on GitHub.

daq · July 20, 2018, 9:48pm

Slightly unrelated question:
What happens when template includes a mapping for a variable to be cast as long, but variable comes in with quotes like "1024"? Does it get discarded? I noticed that the sum docs.count + docs.deleted are close to the total number of documents in the original index. So while there are still some documents completely missing from destination index this would at least explain part of it.

In my latest test out of 1151419 total documents, destination index contains docs.count = 684045 and docs.deleted = 196412. So while 270962 are still missing, at least 200k are simply deleted.

I also tried reindexing into a name that doesn't match any templates and all documents made it across. So while it might still be a bug worth reporting on GitHub I suspect I just don't have a full understanding of how mapping changes affect reindexing.

warkolm · July 22, 2018, 8:42pm

Deleted docs could be an indicator of things being overwritten.

Are you using beats as the data source? Are they passed to anything else before they hit Elasticsearch?

daq · July 23, 2018, 5:09pm

Deleted docs only show up during reindexing, but original source is filebeat.
Full path for data is filebeat -> logstash (all filtering happens here) -> redis -> logstash -> elasticsearch.

I'm not altering the data in any way during reindexing. Only the mapping template is different since I'm trying to remap a couple of strings into longs.

Do I need to do it by casting the variable in a script or should updating mapping in a template enough?

Thanks!

system · August 20, 2018, 5:09pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Reindex not copying every document that exists on the source index Elasticsearch	1	472	July 28, 2020
Can't got actual data size and documents Elasticsearch reindex	6	418	June 19, 2021
Reindex in a cluster does not get all documents Elasticsearch	6	2813	July 30, 2019
Debugging Partially Complete Reindexing Task? Elasticsearch	2	786	February 6, 2019
Reindexing not copying all data Elasticsearch docker	4	1717	June 1, 2020

Not all documents copied after reindex

Related topics