I have this situation that I have a disconnected elasticsearch[1] environment and I need to transfer the data to another elasticsearch[2] cluster.
I was thinking to export all of the data from [1] to a Json file (with log stash pipeline) and transfer the Json file to the [2] cluster, then using the bulk api for importing the indices.
Is there any better way to perform this operation given that the clusters[1]+[2] are not connected via the internet?
Hey @wangqinghuan
I did not mention that I am not allowed to change elasticsearch configuration (for adding "path") to the elasticsearch.yml.
Therefor I am trying to export elasticsearch indices to a Json file and import it with bulk api to another cluster.
This is my logstash pipeline:
input {
elasticsearch {
hosts => "localhost:9200"
index => "*"
docinfo => true
size => 10000
}
}
I don't know how to export meta_data line with logstash pipeline. However, you can use logstash to import index.json into your new Elasticsearch cluster.
#!/usr/bin/env python3
filepath = '<PATH TO JSON FILE>'
metadata='{ "index": { "_index": "INDEX_NAME", "_type": "_doc" }}'
with open(filepath, mode="r",encoding="utf-8") as my_file:
for line in my_file:
print(metadata)
print(line.rstrip("\n"))
And I ran it against the exported Json I got from the first Elasticsearch cluster, this script added the missing action_and_meta_data lines, then I could use the bulk api to push it to the second elasticsearch cluster.
If there is a way to export the data in the following format:
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.