Snapshot/Restore VS ReIndexing - Missing documents, zero failures

Joao_Barreto · February 29, 2024, 12:51pm

Hi everyone.

A few weeks ago I created a post regarding a data migration I have to do in a customer with a Platinum License. The conclusion was that I needed to go from elastic 6 to 7 then to 8.12 in order to have the data fully migrated and accessible.

Right now, in a development environment I'm doing a snapshot & restore from 6 do 7.

Everything seems great. I have an index with 4361960 docs and 738.9mb.

When I try to reindex this index, and create a new one, it then shows 142638 docs and 631.3mb.

If I do the same reindex from remote, same index, I get the same number of records, 142638 and around the same size 633.9mb.

I'm completely lost here... I need to reindex the information in Elastic 7, in order to pass it to Elastic 8, because of the archive feature not being support by the Platinum license.

I have zero failures when doing the reindex process.

From 6 to 7, is there something that is being super restructured, eliminating the need for millions of other docs?

Thanks for your time!

Kind regards.

leandrojmp · February 29, 2024, 2:48pm

It is 1:1 reindex? Like one source index to one destination index or it is N:1, as using a wildcard in the source index?

Can you run this request on Kibana Dev tools on both your source and destination cluster?

GET _cat/indices/index-name?v

Also, what was the Reindex request you run? Since it not much data, can you run it again and share both the request and the response?

Joao_Barreto · February 29, 2024, 3:19pm

(this is a development/test environment, only 1 node for Elastic 6, and another node for elastic 7)

Ok so...

20240202 is the snapshot from 6, restored directly in 7. Same number of docs and same size between both instances (I've checked!).
20240202_8_reindextst is the reindex process but getting the data from 6 into the 7 instance.
20240202_8 is the reindex process being ran locally

This is what I'm running (sorry but I don't have access to Kibana frontend at the moment)

And for the snapshot restore, I'm doing the entire process and then running the restore command, which seems to be working since the number of docs is the same between both instances, 4361960:

curl -u elastic:$ELASTICPASS -X POST "$HOSTNAME:9200/_snapshot/repository/XXXXXXXX_XXX_20240202/_restore" -H 'Content-Type: application/json' -d '{
  "indices": "XXXXXXXX_XXX_20240202",
  "ignore_unavailable": true,
  "include_global_state": false
}'

dadoonet · February 29, 2024, 11:38pm

Did you apply the same mapping in the destination cluster?
I'm feeling that you had nested fields but you don't set the mapping accordingly.

Joao_Barreto · March 1, 2024, 9:05am

Hi David!

Thanks for the insight!

To tell you the truth I don't even now what mappings are... I'm gonna learn and investigate about the concept and get back to you as soon as possible.

In my mind, snapshot-restore + reindex would be simple processes and as a sysadmin I didn't have to deal with the data schemas.

But give me a few hours to learn about it, compare what was done in Elastic 6 and I'll get back at you.

Thanks!

dadoonet · March 1, 2024, 9:49am

So snapshot/restore is the "db admin" task you'd look for.
It's IMO an admin task.

Reindex is:

Search for documents
For each document, read the original JSON from the _source field
send that json to the new index

I consider that more as a DEV task than an admin task.

If your index has not been configured (same as in RDBMS, if the DBA did not create the destination table), then Elasticsearch tries to guess and creates a default schema.

You can run on the source cluster:

GET /INDEXNAME/

And use that info to create the destination index. See Create index API | Elasticsearch Guide [8.12] | Elastic

Joao_Barreto · March 1, 2024, 10:45am

The thing is... I need to migrate the data from elastic 6.X to 8.12. And to do that, with Platinum License, I have to pass through Elastic 7.X.

(Only the Enterprise license is capable of dealing with the Archive data, in this case, the jump from 6 to 8.12)

I've already tested the Snapshot-restore process, and it works perfectly from 6 to 7. But when I do the same process from 7 to 8, I still get the "archive" error/warning on 8.12.

That's why I was trying to reindex the information on Elastic 7 to see if it needed some changes on the data (at least that was my idea....)

I've tried reindex a smaller index and I still get missing documents like it goes from the original 11000 to 400 in the new reindexed index.

On my mind... snapshot-restore 6 to 7 and 7 to 8 should have done the trick... but then again... when I reach 8... it still gives me the archive error.

dadoonet · March 1, 2024, 11:26am

Once you are on 7.17, open the upgrade assistant.
It will help you to update old 6.x indices to the new format.

See Upgrade Elasticsearch | Elasticsearch Guide [8.12] | Elastic

This enables you to use the Upgrade Assistant to identify and resolve issues, reindex indices created before 7.0, and then perform a rolling upgrade.

Joao_Barreto · March 1, 2024, 12:47pm

So basically... what you are saying is that, my snapshot/restored index from 6 to 7 that I think it's correct (the one with 4 million docs), that I should be able to do something with it on the Upgrade Assistant menu in Elastic 7/Kibana?

I've already asked the network crew to open up communication from "my workstation" to the Elastic 7 node 5601 port. I'm still waiting.

Got to try that idea...

leandrojmp · March 1, 2024, 12:51pm

If you have access to Elasticsearch then you can still get the mapping of the indices as asked before.

For exampl run a curl to your elasticsearch node on the endpoint $HOSTNAME:9200/index-name/_mapping

This will show you what is the mapping of the index, run this against your cluster on version 6 for one of the index and them against your cluster on version 7 for the same reindexed index, this will help to compare what are the differences.

Joao_Barreto · March 1, 2024, 1:04pm

It's huge... the mapping from top to bottom has more than 4000 lines on 6, and on 7 after reindex process into a new index, it shows 2000 lines.

leandrojmp · March 1, 2024, 1:07pm

This already indicates a possible issue.

Can you share the mapping? Put them on a gist on github because of the size.

Christian_Dahlqvist · March 1, 2024, 1:17pm

Search the mappings for the word nested and see if the count match.

Joao_Barreto · March 1, 2024, 1:55pm

On the Snapshot(6)-Restore(7) Full Index, the one with 4 million docs, on elastic 7 it shows 26 nested words.

On the reindex index, the one with 142638 records, it shows... ZERO "nested" words.

Snapshot(6)-Restore(7) index = 4361960 docs - 26 nested words counted in the mapping
Reindex index = 142638 docs - 0 nested words counted in the mapping

leandrojmp · March 1, 2024, 2:11pm

It makes sense now as nested fields are counted as individual documents.

Before doing the reindex did you create the template for your index on the destination cluster?

You need to use the same template/mapping.

You can get a list of the templates on the endpoint GET /_template/ and the individual template with GET /_template/template-name.

You will need to find what is the template used by the indices and copy and apply them on the destination index.

Joao_Barreto · March 1, 2024, 2:42pm

Is it possible that I don't have any templates on Elastic 6? Cause it really feels like there are none regarding normal indexes...

Also if a run the following:
GET /xxxxxx_xxx_20240202/_settings

I don't see any "template" tag.

And again, I'm not a developer

dadoonet · March 1, 2024, 3:42pm

Which path do you want to follow? I'm a bit lost...

Using Snapshot/Restore?
Using reindex from remote?

Snapshot/Restore

Restore in 7.17, then open the migration assistant and follow the instructions. Then upgrade the 7.17 cluster to latest 8.x

Reindex from remote

In the old cluster, run:

GET /INDEXNAME

Take the output of this and run this in the 8.x cluster:

DELETE /INDEXNAME
PUT /INDEXNAME
{
   // The json you got from the previous step
}

Then call the reindex from remote call

That's basically the 2 options.

Let us know if something is unclear or does not work...
Note that the call GET /INDEXNAME might have some not desired metadata... But if you don't know, share here the output and we'll help.

Joao_Barreto · March 1, 2024, 3:50pm

Hi David.

It's not you who is lost. It's me

Since we can't jump 2 Elastic versions (from 6 to 8) by doing the Snapshot-Restore method while having the platinum license, I thought that the only way to solve this was to create a new Elastic 7 instance, snapshot-restore from 6 to 7, and then run the reindex method to "change" the data so it could be (then again) snapshot-restored from 7 to 8.

This was my initial idea, when people said to me that one method to solve the "archive" situation was to pass through 7...

Now it seems, I understood the situation completely wrong.

So, the right method is to snapshot from 6 to restore on 7, and then! upgrade the installation to Elastic 8?

dadoonet · March 1, 2024, 6:10pm

Yes. That's right.

Joao_Barreto · March 4, 2024, 2:40pm

Well... since Friday I've remade my entire lab environment and I was able to, finally, migrate from 6 to 8 following your insights and advises.

What now is bugging me a lot is the actual process of ReIndexing via Kibana Upgrade Assistant.

If I perform the reindexing process on my Dev Tools, I lose a bunch of docs because of the situation around the "nested" thing.

But if I perform the "same" process in the Upgrade Assistant, everything performs correctly.

The huge problem that I have is that this process, in the WebUI of Kibana, is a manual one.

I have more than a 1000 indices that I need to migrate and therefor to reindex via Upgrade Assistant... and right now, since I need to click a few times to change some check boxes, etc, etc, it all works... but this is not doable for 1000 and more indices.

My question is... if a simple reindex source-destination command doesn't work in dev tools (nested situation makes my docs fly away into the fifth dimension)... and in Kibana Upgrade Assistant everything goes perfectly... how the heck am I supposed to do this for 1000 and more indices????

EDIT:
Btw, the upgrade assistant asks for the following changes, which need to be checked/confirmed via manual input:

Index replacement - basically reindexing and creating the alias...
Mapping replacement - doc with _doc
Removing index.soft_deletes.enable (deprecated).

And then... magically... it does its job... /sadface... XD

Topic		Replies	Views
Around 1500 indexes to snapshot and restore in another cluster. How would you do it? Elasticsearch snapshot-and-restore	9	189	March 21, 2024
[Elasticsearch]Data Migration Elasticsearch	5	341	March 11, 2019
ES5 -> ES6 -> ES7, Snapshot Restore, Reindex, Index Size increase Elasticsearch	2	1083	May 7, 2020
Restore with new mappings? Elasticsearch	3	643	September 13, 2017
Elasticsearch cross cluster data migration approach Elasticsearch	2	193	November 14, 2022

Snapshot/Restore VS ReIndexing - Missing documents, zero failures

Snapshot/Restore

Reindex from remote

Related topics