Copy Data from ClusterOld to ClusterNew into the same index


#1

Hi,
I have 2 ES-clusters with the same index-structure. I want merge data from cluster1 into cluster2 (same index).

I wrote a small java programm to do this work using Transport Client API and it works for a small amount of data but failed to merge 70GB data. It throws exception "org.elasticsearch.transport.ReceiveTimeoutTransportException:..." after a while which leads to data loss.

Is there a build-in process in ES to do this work - merging data over clusters?


(Ramy) #2

Hi Stesig,
Did you try snapshot and restore?
https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-snapshots.html
I've never used. But I think this the right way to copy data from one cluster to another.
Cheers, Ramy


#3

I did not try but I already read this articel and I think restoreing will overwrite the data of the index of destination cluster. I need to merge the data from cluster1.indexA and cluster2.indexA.

I will try this but I'm not convinced that it will work.


(Ramy) #4

It's important to merge them? probably it will be easier if you you save the copied index with another name and when you want to query. You query both indices!


#5

An application is accessing the database and it is not possible to change all queries just because I can't merge two indexies of two clusters. Programmatically it is quite easy, creating two e.g. Transport Client (JAVA API) connecting them to two clusters (src/dst) and copy data from clusterA to clusterB.
It works with small data but if this process runs hours (~80GB), ES thows "org.elasticsearch.transport.ReceiveTimeoutTransportException" somtimes and the bulk operation fails (data loss). I'm not intend to spend much time into the small copy-java programm to workaround ES Exceptions. I don't know why this Exception happens and could not find a solution. Both Cluster and Transport Clients runs on the some machine, no real network traffic. Therefore, I was looking for a third party tool or an alternativ way.

Thanks


(Ramy) #6

Ok I see your problem. Probably Logstash will help!


(Mark Walkom) #7

Check out https://gist.github.com/markwalkom/8a7201e3f6ea4354ae06


(system) #8