No errors anywhere, but I'm missing about half the documents when the task completes:
get _cat/indices/index-*
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
green open index-old tEdSeRmFQ7WhGFoWb4vYVw 6 1 993613 0 1.9gb 986.7mb
green open index-new 3KOMWY9lSyyASJS9TgPMEQ 6 1 532999 0 1gb 541.2mb
I've done this many times before and reindex is usually very reliable. How can I figure out what happened to the rest of the documents?
Another index is now exhibiting identical problem. Copies about half the data and just stops without errors. This one is much larger so I don't think its a resource problem.
Couple of other things I noticed:
It doesn't always stop on the same document.
Out of total of 1151419 documents, first run quit on 634999 and second on 651999.
I was also looking at _nodes/stats and didn't see any significant differences in nodes running the reindex tasks and those that weren't.
I tried reindexing into a different name to make sure errors in the template aren't affecting reindexing and the same problem exists. Only about half the data got copied.
Does anyone have any other suggestions for troubleshooting/debugging this?
Slightly unrelated question:
What happens when template includes a mapping for a variable to be cast as long, but variable comes in with quotes like "1024"? Does it get discarded? I noticed that the sum docs.count + docs.deleted are close to the total number of documents in the original index. So while there are still some documents completely missing from destination index this would at least explain part of it.
In my latest test out of 1151419 total documents, destination index contains docs.count = 684045 and docs.deleted = 196412. So while 270962 are still missing, at least 200k are simply deleted.
I also tried reindexing into a name that doesn't match any templates and all documents made it across. So while it might still be a bug worth reporting on GitHub I suspect I just don't have a full understanding of how mapping changes affect reindexing.
Deleted docs only show up during reindexing, but original source is filebeat.
Full path for data is filebeat -> logstash (all filtering happens here) -> redis -> logstash -> elasticsearch.
I'm not altering the data in any way during reindexing. Only the mapping template is different since I'm trying to remap a couple of strings into longs.
Do I need to do it by casting the variable in a script or should updating mapping in a template enough?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.