How to take data from old server and *merge* it with data on new server


(John Thornborrow) #1

Hi,

We've recently upgraded our hardware, and have now got a replacement ES cluster.

I want to restore the indices on our old cluster into/onto the new cluster without losing what is currently on the new cluster. I.e. I want to merge the data.

Which is the best way to achieve this? I've read the snapshot/restore docs and it appears that the restore doesn't have a non-destructive/merge feature, only a "drop and recreate" (to borrow an antiquated phrase) method.

This is all on Windows, which I'm sure I'll be told makes things even more difficult.

Thanks for any help,
John


(Colin Goodheart-Smithe) #2

You could write a small program using one of the official Elasticsearch clients, which uses the scan-scroll API and the bulk API to pull the data out of the old cluster and index it into the new cluster.

The challenge here from Elasticsearch's prospective is how to perform the 'merge' of the data. If there is a clash between the clusters (i.e. a document already exists on the new cluster when you index from the old cluster) what should ES do? Keep the new one? The old one? or maybe there is something in the document that would indicate which one to keep? These are questions that are much easier for you to answer in a custom application/tool than for Elasticsearch to be able to provide options for out of the box.


(John Thornborrow) #3

Thanks for the info.

It so happens that our indexes are using GUIDs, so clashes "won't" happen (famous last words?). Though I fully understand the predicament.

Regards,
John


(Colin Goodheart-Smithe) #4

In which case another option could be to restore the old clusters data to a different index in the new cluster and use an alias to query both at the same time.


(John Thornborrow) #5

I've gone for the small app option with scroll and bulk API calls, but I notice I am not getting the _type information - which is quite essential to what we are doing.

I seem to remember ES just simply doesn't return this field in a search - but surely there's a way to force it to do so?

edit: Using c# and NEST.


(John Thornborrow) #6

Actually I can see what's happening here.. I am getting he _id/_type/etc fields returned, but NEST is not including them when it parses the hits into the Documents collection. This is going to take some wrestling.


(Martijn Laarman) #7

Hey John, the .Documents collections is a special view on .Hits returning only the _source's for each hit.

Loop over .Hits if you need the document metadata.


(John Thornborrow) #8

Hi Martijn,

Yes, absolutely right. The end result saw something similar to this:

        nestClient.Bulk(new BulkRequest()
        {
            Operations = searchResponse.Hits.Select(x => new BulkIndexOperation
            {
                Id = x.Id,
                Index = x.Index,
                Type = x.Type
            }).Cast<IBulkOperation>().ToList()
        });

Inside the scroll loop.


(system) #9