I spend a couple of time to find out how ES can possibly integrate to HDFS.
We have an ES cluster running on top of YARN and want the cluster to be fail safe, e.g. survive a YARN restart.
My conclusion is:
(1) you can mount HDFS as NFS and point ES to a NFS path (downside: slowdown)
(2) you can use repository-hdfs and 'manually' care about backup and restore to and from HDFS
Any other options ?
Also i'm yet un-decided on whether to use ES 1.x or 2.x, does it matter in that perspective ?
best
Johannes
Ok, understood. Don't use option (1) / NFS-HDFS.
So but thats all my options ? There isn't an option (3) where all my data is persisted in HDFS but the nodes operate on a local copy or anything like that !?
What I think @ssatapathy is suggesting is to keep your data in HDFS (primary storage) and load it through Hadoop jobs into ES. ES is using the out of the box configuration, writing data to the local disks/storage.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.