A few weeks ago I created a post regarding a data migration I have to do in a customer with a Platinum License. The conclusion was that I needed to go from elastic 6 to 7 then to 8.12 in order to have the data fully migrated and accessible.
Right now, in a development environment I'm doing a snapshot & restore from 6 do 7.
Everything seems great. I have an index with 4361960 docs and 738.9mb.
When I try to reindex this index, and create a new one, it then shows 142638 docs and 631.3mb.
If I do the same reindex from remote, same index, I get the same number of records, 142638 and around the same size 633.9mb.
I'm completely lost here... I need to reindex the information in Elastic 7, in order to pass it to Elastic 8, because of the archive feature not being support by the Platinum license.
I have zero failures when doing the reindex process.
From 6 to 7, is there something that is being super restructured, eliminating the need for millions of other docs?
And for the snapshot restore, I'm doing the entire process and then running the restore command, which seems to be working since the number of docs is the same between both instances, 4361960:
So snapshot/restore is the "db admin" task you'd look for.
It's IMO an admin task.
Reindex is:
Search for documents
For each document, read the original JSON from the _source field
send that json to the new index
I consider that more as a DEV task than an admin task.
If your index has not been configured (same as in RDBMS, if the DBA did not create the destination table), then Elasticsearch tries to guess and creates a default schema.
The thing is... I need to migrate the data from elastic 6.X to 8.12. And to do that, with Platinum License, I have to pass through Elastic 7.X.
(Only the Enterprise license is capable of dealing with the Archive data, in this case, the jump from 6 to 8.12)
I've already tested the Snapshot-restore process, and it works perfectly from 6 to 7. But when I do the same process from 7 to 8, I still get the "archive" error/warning on 8.12.
That's why I was trying to reindex the information on Elastic 7 to see if it needed some changes on the data (at least that was my idea....)
I've tried reindex a smaller index and I still get missing documents like it goes from the original 11000 to 400 in the new reindexed index.
On my mind... snapshot-restore 6 to 7 and 7 to 8 should have done the trick... but then again... when I reach 8... it still gives me the archive error.
So basically... what you are saying is that, my snapshot/restored index from 6 to 7 that I think it's correct (the one with 4 million docs), that I should be able to do something with it on the Upgrade Assistant menu in Elastic 7/Kibana?
I've already asked the network crew to open up communication from "my workstation" to the Elastic 7 node 5601 port. I'm still waiting.
If you have access to Elasticsearch then you can still get the mapping of the indices as asked before.
For exampl run a curl to your elasticsearch node on the endpoint $HOSTNAME:9200/index-name/_mapping
This will show you what is the mapping of the index, run this against your cluster on version 6 for one of the index and them against your cluster on version 7 for the same reindexed index, this will help to compare what are the differences.
Which path do you want to follow? I'm a bit lost...
Using Snapshot/Restore?
Using reindex from remote?
Snapshot/Restore
Restore in 7.17, then open the migration assistant and follow the instructions. Then upgrade the 7.17 cluster to latest 8.x
Reindex from remote
In the old cluster, run:
GET /INDEXNAME
Take the output of this and run this in the 8.x cluster:
DELETE /INDEXNAME
PUT /INDEXNAME
{
// The json you got from the previous step
}
Then call the reindex from remote call
That's basically the 2 options.
Let us know if something is unclear or does not work...
Note that the call GET /INDEXNAME might have some not desired metadata... But if you don't know, share here the output and we'll help.
Since we can't jump 2 Elastic versions (from 6 to 8) by doing the Snapshot-Restore method while having the platinum license, I thought that the only way to solve this was to create a new Elastic 7 instance, snapshot-restore from 6 to 7, and then run the reindex method to "change" the data so it could be (then again) snapshot-restored from 7 to 8.
This was my initial idea, when people said to me that one method to solve the "archive" situation was to pass through 7...
Now it seems, I understood the situation completely wrong.
So, the right method is to snapshot from 6 to restore on 7, and then! upgrade the installation to Elastic 8?
Well... since Friday I've remade my entire lab environment and I was able to, finally, migrate from 6 to 8 following your insights and advises.
What now is bugging me a lot is the actual process of ReIndexing via Kibana Upgrade Assistant.
If I perform the reindexing process on my Dev Tools, I lose a bunch of docs because of the situation around the "nested" thing.
But if I perform the "same" process in the Upgrade Assistant, everything performs correctly.
The huge problem that I have is that this process, in the WebUI of Kibana, is a manual one.
I have more than a 1000 indices that I need to migrate and therefor to reindex via Upgrade Assistant... and right now, since I need to click a few times to change some check boxes, etc, etc, it all works... but this is not doable for 1000 and more indices.
My question is... if a simple reindex source-destination command doesn't work in dev tools (nested situation makes my docs fly away into the fifth dimension)... and in Kibana Upgrade Assistant everything goes perfectly... how the heck am I supposed to do this for 1000 and more indices????
EDIT: Btw, the upgrade assistant asks for the following changes, which need to be checked/confirmed via manual input:
Index replacement - basically reindexing and creating the alias...
Mapping replacement - doc with _doc
Removing index.soft_deletes.enable (deprecated).
And then... magically... it does its job... /sadface... XD
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.