I have a large ES index in production cluster which is snapshot daily. For some analytics purpose I want to read all the docs in that index in RDD (ES-Hadoop for apache spark library jars can be useful here) but I don't want to restore this snapshot to ES Cluster which seems wasteful. Is it possible? If not what other options I have, ElasticDump?
As I mentioned in a previous post about this:
There was some talk a while ago about potentially supporting this in some fashion, but ultimately it was decided against. When you create a snapshot in Elasticsearch, you're just moving the Lucene indexing files to a block storage location, but those Lucene files can change from release to release and require the same version of reader from ES. It's an idea that has lots of potential for performance improvement, but the current drawbacks we're finding with it means that we're not pursuing it at the moment.
It would be a cool feature to include (#PRsWelcome) but it's not one that we are pursuing at the moment.
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.