I need to import the data in an ElasticSearch cluster, but I may not read the cluster directly. I knew I could restore the snapshot into a cluster. I just wondering if I can read the sanpshot file directly from Hadoop.
Another question. Is repository-hdfs writes in the same format like a local file system?
Not that I'm aware of and further more it's not something recommended. It's much faster or easier to just export them data in Json or otherwise since a snapshot contains not just the data but also the ES metadata which is version specific and meant for ES only
Using scroll API to export data in JSON is much more more slower than the snapshot API.
I just thought maybe run some hadoop jobs by writing some es query could be easier than writing pure batch mapreduce code. And We can save some space if we don't need to transfer data both to es and hadoop, we just need snapshot the cold data to hadoop.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.