Migrating elasticsearch data

Hey Everyone,

I have this situation that I have a disconnected elasticsearch[1] environment and I need to transfer the data to another elasticsearch[2] cluster.
I was thinking to export all of the data from [1] to a Json file (with log stash pipeline) and transfer the Json file to the [2] cluster, then using the bulk api for importing the indices.

Is there any better way to perform this operation given that the clusters[1]+[2] are not connected via the internet?

hi Shahaf
A more fast way is SnapShot/Restore.

Hey @wangqinghuan
I did not mention that I am not allowed to change elasticsearch configuration (for adding "path") to the elasticsearch.yml.
Therefor I am trying to export elasticsearch indices to a Json file and import it with bulk api to another cluster.

This is my logstash pipeline:
input {
elasticsearch {
hosts => "localhost:9200"
index => "*"
docinfo => true
size => 10000
}
}

output {
file {
path => "/var/log/logstash/index.json"
codec => json_lines
}
}

After getting the Json file, the bulk api does not seems to work for the second cluster, am I doing anything wrong?

Bulk api:
curl -v -H 'Content-Type: application/x-ndjson' -XPOST 'localhost:9200/_bulk?pretty' --data-binary @/var/log/logstash/index.json

Does the index.json contain action_and_meta_data lines? A standard JSON structure that bulk API expects:

{ "index" : { "_index" : "test", "_id" : "1" } }
{ "field1" : "value1" }

You could use elasticdump: https://www.npmjs.com/package/elasticdump

@wangqinghuan
This is probably my issue, my file does not have the first line:
{ "index" : { "_index" : "test", "_id" : "1" } }

How can I export all of my indices to one file with the structured you mentioned using logstash pipeline?

@sspilleman
This way requires from me configuration changes which in my case is not an option.

I don't know how to export meta_data line with logstash pipeline. However, you can use logstash to import index.json into your new Elasticsearch cluster.

I found this python script

#!/usr/bin/env python3
filepath = '<PATH TO JSON FILE>'
metadata='{ "index": { "_index": "INDEX_NAME", "_type": "_doc" }}'
with open(filepath, mode="r",encoding="utf-8") as my_file:
    for line in my_file:
        print(metadata)
        print(line.rstrip("\n"))

And I ran it against the exported Json I got from the first Elasticsearch cluster, this script added the missing action_and_meta_data lines, then I could use the bulk api to push it to the second elasticsearch cluster.

If there is a way to export the data in the following format:

{ "index" : { "_index" : "test", "_id" : "1" } }
{ "field1" : "value1" }

it will be the preferred way.

Anyway, thanks you for your help @wangqinghuan !

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.