Use Snapshot from Hadoop?

ebuildy · October 24, 2016, 4:23pm

Just curious, is it possible to read a snapshot with Java? (Especially with Hadoop techo. such as Hive or Spark or PIG).

We have a ton of snapshots, would like read data but I don't want to restore them one by one for that.

Many thanks,

james.baiera · November 11, 2016, 5:30pm

There was some talk a while ago about potentially supporting this in some fashion, but ultimately it was decided against. When you create a snapshot in Elasticsearch, you're just moving the Lucene indexing files to a block storage location, but those Lucene files can change from release to release and require the same version of reader from ES. It's an idea that has lots of potential for performance improvement, but the current drawbacks we're finding with it means that we're not pursuing it at the moment.

ebuildy · November 11, 2016, 7:58pm

D'ho ;-(

Currently we are using Spark, with fantastic Es4Hadoop plugin, to export ES index into Parquet files, this work fine, but this involves 2 technologies (Spark / Parquet), whereas I would prefer keep ES only.

Thanks you for the explanation.

suanmeiguo · June 8, 2017, 6:48pm

I wanna vote on this as well. My spark job read from the cluster directly and it's a big load on the cluster. This feature can definitely help on batch jobs.

ebuildy · June 9, 2017, 10:49am

Yah absolutely, especially in big-data world, it make no sense to use HTTP for data transfert, so slow (also es4hadoop dont use gzip grrr).

In my case, I am going to move from elasticsearch to druid.io , which is more big-data friendly.

prabhushrikant · August 4, 2017, 7:46am

Hi , how do you create parquet files from an ES index snapshot?

system · June 1, 2020, 11:18pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Read ES index snapshot in spark without restore Elasticsearch es-hadoop	2	1509	September 1, 2017
Is there a Hadoop Inputformat to read ES snapshot files in Hadoop? Elasticsearch es-hadoop	5	1489	July 6, 2017
Snapshot HDFS files encrypted? Elasticsearch es-hadoop	5	1025	July 6, 2017
Create parquet files from an ES index snapshot? Elasticsearch	5	3482	March 24, 2018
It's recommended to use Hadoop (HDFS) as a storage for Elasticsearch Snapshots? Elasticsearch es-hadoop	1	684	March 10, 2017

Use Snapshot from Hadoop?

Related topics