Snapshot/Restore VS ReIndexing - Missing documents, zero failures

The documents do not vanish. The document count in the cat indices API includes internal nested documents, which are not present in the new index as the mappings are not the same. If you use the count API against both indices you should see that the indices contains the same number of documents.

If you are writing a script you need to do the following for each index:

  • Fetch the existing mapping from the index you are reindexing from
  • Create the new index you are to reindex into with the mapping extracted in the previous step
  • Reindex

I think that the upgrade assistant actually only run a "local reindex" which is actually an update by query:

POST my-index-000001/_update_by_query?conflicts=proceed

Or a force merge. I don't remember exactly.

POST /my-index-000001/_forcemerge

The goal is to rewrite the segment files that have been created with Elasticsearch 6 (Lucene 7) to Lucene 8. Elasticsearch 8 can read Lucene 8 and Lucene 9 files.
That's why this is needed. Unless you have an enterprise license which allows to read older indices directly and avoid all those manipulations.

Hope this clarifies what is happening behind the scene :wink:

What a day this was...

The amount of tests I did to solve the problems with the Mapping... uffff... so many errors...

Changes that I did into my Elastic 6.8 Mapping file:

  • Added "_doc" section;
  • Added new "analysis" section with autocomplete_filter and analyzer and a bunch of crap more...;
  • Added index.mapping.total_fields.limit cause 1000 weren't enough.
  • Added include_type_name=true to my curl command when creating my test index

And only after that, I could manually _reindex my data from 6.8 into the new version.

This was "basically" what you guys said.

  • Extract Mapping from Elastic 6.X
  • Create empty index with mapping of Elastic 6.X in Elastic 7.X
  • ReIndex previously restored index in Elastic 7.X, into the above index.
  • Create aliase so in theory we don't make developers cry.

Bam... profit. Cost? I've become bald. :crazy_face:

Next steps... I will need to script this entire process. The best thing in all this situation is that I only have to extract and "transform" 2 mappings, since I only have 2 main structures. 2 "main indexes".
The 1000 indices that I have in the production environment are basically partitions. 1 per day.

Anyway, I think for now, I'm all set.

I wanna thank all of you who help me out during these past few days!

Thanks for the help and have a nice week!

Why not reindexing directly in 8.x?

2 Likes

Because I'm dumb as f.... :rofl:

Since I've got the mapping situation right.... guess I could... try to reindex from 6 to 8... I'm gonna test that first thing in the morning. :laughing:

But I think for big indexes... like 4GB could give some problems... I dont know... but in theory it is way easier

Just to close this situation.

I had to make a few changes in my mapping file, in order to use it in Elastic 8

  • Removed "_doc"
  • Removed "include_type_name=true" from the execution since it was removed in Elastic 8.

And that was it...

Now I think I'll need to build some Java program with Thread Pools, to speed up the process.

Task pool:

  • List of Indices names.

Task:

  • Step 1 - create index with mapping
  • Step 2 - Reindex
  • Step 3 - Refresh (?)

1000-1500 indices to migrate... this will be insane...

1 Like

Great!

That's the best option IMHO when moving up to 2 major versions. You will immediately benefit from a lot of optimizations.

If the data you are migrating is time based indices, I'd also add a force merge call to 1 segment at the end of your process (before the refresh).

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.