Hi,
I have 2 ES-clusters with the same index-structure. I want merge data from cluster1 into cluster2 (same index).
I wrote a small java programm to do this work using Transport Client API and it works for a small amount of data but failed to merge 70GB data. It throws exception "org.elasticsearch.transport.ReceiveTimeoutTransportException:..." after a while which leads to data loss.
Is there a build-in process in ES to do this work - merging data over clusters?
I did not try but I already read this articel and I think restoreing will overwrite the data of the index of destination cluster. I need to merge the data from cluster1.indexA and cluster2.indexA.
I will try this but I'm not convinced that it will work.
It's important to merge them? probably it will be easier if you you save the copied index with another name and when you want to query. You query both indices!
An application is accessing the database and it is not possible to change all queries just because I can't merge two indexies of two clusters. Programmatically it is quite easy, creating two e.g. Transport Client (JAVA API) connecting them to two clusters (src/dst) and copy data from clusterA to clusterB.
It works with small data but if this process runs hours (~80GB), ES thows "org.elasticsearch.transport.ReceiveTimeoutTransportException" somtimes and the bulk operation fails (data loss). I'm not intend to spend much time into the small copy-java programm to workaround ES Exceptions. I don't know why this Exception happens and could not find a solution. Both Cluster and Transport Clients runs on the some machine, no real network traffic. Therefore, I was looking for a third party tool or an alternativ way.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.