How to take data from old server and merge it with data on new server

John_Thornborrow · August 19, 2015, 9:42am

Hi,

We've recently upgraded our hardware, and have now got a replacement ES cluster.

I want to restore the indices on our old cluster into/onto the new cluster without losing what is currently on the new cluster. I.e. I want to merge the data.

Which is the best way to achieve this? I've read the snapshot/restore docs and it appears that the restore doesn't have a non-destructive/merge feature, only a "drop and recreate" (to borrow an antiquated phrase) method.

This is all on Windows, which I'm sure I'll be told makes things even more difficult.

Thanks for any help,
John

colings86 · August 19, 2015, 9:48am

You could write a small program using one of the official Elasticsearch clients, which uses the scan-scroll API and the bulk API to pull the data out of the old cluster and index it into the new cluster.

The challenge here from Elasticsearch's prospective is how to perform the 'merge' of the data. If there is a clash between the clusters (i.e. a document already exists on the new cluster when you index from the old cluster) what should ES do? Keep the new one? The old one? or maybe there is something in the document that would indicate which one to keep? These are questions that are much easier for you to answer in a custom application/tool than for Elasticsearch to be able to provide options for out of the box.

John_Thornborrow · August 19, 2015, 10:16am

Thanks for the info.

It so happens that our indexes are using GUIDs, so clashes "won't" happen (famous last words?). Though I fully understand the predicament.

Regards,
John

colings86 · August 19, 2015, 10:19am

In which case another option could be to restore the old clusters data to a different index in the new cluster and use an alias to query both at the same time.

John_Thornborrow · August 19, 2015, 2:38pm

I've gone for the small app option with scroll and bulk API calls, but I notice I am not getting the _type information - which is quite essential to what we are doing.

I seem to remember ES just simply doesn't return this field in a search - but surely there's a way to force it to do so?

edit: Using c# and NEST.

John_Thornborrow · August 19, 2015, 2:59pm

Actually I can see what's happening here.. I am getting he _id/_type/etc fields returned, but NEST is not including them when it parses the hits into the Documents collection. This is going to take some wrestling.

Martijn_Laarman · August 19, 2015, 3:38pm

Hey John, the .Documents collections is a special view on .Hits returning only the _source's for each hit.

Loop over .Hits if you need the document metadata.

John_Thornborrow · August 19, 2015, 3:57pm

Hi Martijn,

Yes, absolutely right. The end result saw something similar to this:

        nestClient.Bulk(new BulkRequest()
        {
            Operations = searchResponse.Hits.Select(x => new BulkIndexOperation
            {
                Id = x.Id,
                Index = x.Index,
                Type = x.Type
            }).Cast<IBulkOperation>().ToList()
        });

Inside the scroll loop.

Topic		Replies	Views
Merging indices from old server Elasticsearch	4	353	January 21, 2021
Merging ES Clusters Elasticsearch	4	473	July 4, 2022
How can I move/migrate part of my data to a new cluster? Elasticsearch	5	373	July 4, 2022
Deleted cluster,cant restore index from non-ES snapshot (disk backup) Elasticsearch	9	1361	July 5, 2017
Regarding Elasticsearch Migration between 2 clusters having same version Elasticsearch	3	378	April 26, 2019

How to take data from old server and *merge* it with data on new server

Related topics

How to take data from old server and merge it with data on new server