Use Snapshot from Hadoop?

Just curious, is it possible to read a snapshot with Java? (Especially with Hadoop techo. such as Hive or Spark or PIG).

We have a ton of snapshots, would like read data but I don't want to restore them one by one for that.

Many thanks,

1 Like

There was some talk a while ago about potentially supporting this in some fashion, but ultimately it was decided against. When you create a snapshot in Elasticsearch, you're just moving the Lucene indexing files to a block storage location, but those Lucene files can change from release to release and require the same version of reader from ES. It's an idea that has lots of potential for performance improvement, but the current drawbacks we're finding with it means that we're not pursuing it at the moment.

D'ho ;-(

Currently we are using Spark, with fantastic Es4Hadoop plugin, to export ES index into Parquet files, this work fine, but this involves 2 technologies (Spark / Parquet), whereas I would prefer keep ES only.

Thanks you for the explanation.

I wanna vote on this as well. My spark job read from the cluster directly and it's a big load on the cluster. This feature can definitely help on batch jobs.

1 Like

Yah absolutely, especially in big-data world, it make no sense to use HTTP for data transfert, so slow (also es4hadoop dont use gzip grrr).

In my case, I am going to move from elasticsearch to druid.io , which is more big-data friendly.

1 Like

Hi , how do you create parquet files from an ES index snapshot?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.