Read ES index snapshot in spark without restore

prabhushrikant · August 4, 2017, 8:13am

I have a large ES index in production cluster which is snapshot daily. For some analytics purpose I want to read all the docs in that index in RDD (ES-Hadoop for apache spark library jars can be useful here) but I don't want to restore this snapshot to ES Cluster which seems wasteful. Is it possible? If not what other options I have, ElasticDump?

james.baiera · August 4, 2017, 6:06pm

As I mentioned in a previous post about this:

There was some talk a while ago about potentially supporting this in some fashion, but ultimately it was decided against. When you create a snapshot in Elasticsearch, you're just moving the Lucene indexing files to a block storage location, but those Lucene files can change from release to release and require the same version of reader from ES. It's an idea that has lots of potential for performance improvement, but the current drawbacks we're finding with it means that we're not pursuing it at the moment.

It would be a cool feature to include (#PRsWelcome) but it's not one that we are pursuing at the moment.

system · September 1, 2017, 6:06pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Use Snapshot from Hadoop? Elasticsearch es-hadoop	6	1456	June 1, 2020
Spark/Hadoop batch build shards/Lucene indices Elasticsearch es-hadoop	8	2456	July 6, 2017
Is it possible to update snapshot with new lucene segment and restore Elasticsearch	3	387	July 6, 2017
Snapshot/restore between clusters not working after upgrade to 1.3.4 Elasticsearch	1	328	July 6, 2017
Is there a Hadoop Inputformat to read ES snapshot files in Hadoop? Elasticsearch es-hadoop	5	1489	July 6, 2017

Read ES index snapshot in spark without restore

Related topics