Is there a Hadoop Inputformat to read ES snapshot files in Hadoop?

bewang.tech · December 10, 2015, 12:35am

I need to import the data in an ElasticSearch cluster, but I may not read the cluster directly. I knew I could restore the snapshot into a cluster. I just wondering if I can read the sanpshot file directly from Hadoop.

Another question. Is repository-hdfs writes in the same format like a local file system?

costin · December 10, 2015, 6:20am

You can read the snapshot but by yourself. Note there are no guarantees of its format or that it will remain the same across versions.

repository-hdfs only exposes HDFS to the Snapshot API - it does not alter or interfere the file format.

chenryn · May 11, 2016, 2:58am

hundreds of day away, is there any new open source repo that implement this?

costin · May 18, 2016, 6:51am

Not that I'm aware of and further more it's not something recommended. It's much faster or easier to just export them data in Json or otherwise since a snapshot contains not just the data but also the ES metadata which is version specific and meant for ES only

chenryn · June 3, 2016, 7:56am

Using scroll API to export data in JSON is much more more slower than the snapshot API.
I just thought maybe run some hadoop jobs by writing some es query could be easier than writing pure batch mapreduce code. And We can save some space if we don't need to transfer data both to es and hadoop, we just need snapshot the cold data to hadoop.

Like the Hunk product of Splunk.

Topic		Replies	Views
Use Snapshot from Hadoop? Elasticsearch es-hadoop	6	1474	June 1, 2020
Snapshot HDFS files encrypted? Elasticsearch es-hadoop	5	1031	July 6, 2017
Uploading hdfs snapshot into s3 Elasticsearch es-hadoop	5	1527	July 6, 2017
How to read elasticsearch snapshots stored in hdfs Elasticsearch	2	377	September 8, 2019
Restore Elasticsearch HDFS snapshot on different machine Elasticsearch es-hadoop	4	970	December 19, 2016

Is there a Hadoop Inputformat to read ES snapshot files in Hadoop?

Related topics