HDFS storage options

(Johannes Zillmann) #1

I spend a couple of time to find out how ES can possibly integrate to HDFS.
We have an ES cluster running on top of YARN and want the cluster to be fail safe, e.g. survive a YARN restart.

My conclusion is:

  • (1) you can mount HDFS as NFS and point ES to a NFS path (downside: slowdown)
  • (2) you can use repository-hdfs and 'manually' care about backup and restore to and from HDFS

Any other options ?
Also i'm yet un-decided on whether to use ES 1.x or 2.x, does it matter in that perspective ?

(Mark Walkom) #2

This will be really slow, to the point where it'd be unusable, and we do not recommend it.

(Johannes Zillmann) #3

Ok, understood. Don't use option (1) / NFS-HDFS.
So but thats all my options ? There isn't an option (3) where all my data is persisted in HDFS but the nodes operate on a local copy or anything like that !?

(Sanjukta) #4

Actually you can. One can have HDFS as the primary storage and upload the data from HDSF to ES, where data on ES can exist in local nodes.

(Johannes Zillmann) #5

And how is that configured ?

(Costin Leau) #6

What I think @ssatapathy is suggesting is to keep your data in HDFS (primary storage) and load it through Hadoop jobs into ES. ES is using the out of the box configuration, writing data to the local disks/storage.

(system) #7