HDFS storage options

Johannes_Zillmann · January 5, 2016, 3:21pm

I spend a couple of time to find out how ES can possibly integrate to HDFS.
We have an ES cluster running on top of YARN and want the cluster to be fail safe, e.g. survive a YARN restart.

My conclusion is:

(1) you can mount HDFS as NFS and point ES to a NFS path (downside: slowdown)
(2) you can use repository-hdfs and 'manually' care about backup and restore to and from HDFS

Any other options ?
Also i'm yet un-decided on whether to use ES 1.x or 2.x, does it matter in that perspective ?
best
Johannes

warkolm · January 5, 2016, 8:46pm

This will be really slow, to the point where it'd be unusable, and we do not recommend it.

Johannes_Zillmann · January 7, 2016, 10:08am

Ok, understood. Don't use option (1) / NFS-HDFS.
So but thats all my options ? There isn't an option (3) where all my data is persisted in HDFS but the nodes operate on a local copy or anything like that !?

ssatapathy · January 7, 2016, 2:20pm

Actually you can. One can have HDFS as the primary storage and upload the data from HDSF to ES, where data on ES can exist in local nodes.

Johannes_Zillmann · January 14, 2016, 9:41am

And how is that configured ?

costin · January 14, 2016, 3:37pm

What I think @ssatapathy is suggesting is to keep your data in HDFS (primary storage) and load it through Hadoop jobs into ES. ES is using the out of the box configuration, writing data to the local disks/storage.

Topic		Replies	Views
Elasticsearch and Hadoop Questions Elasticsearch	10	377	July 6, 2017
Can elasticsearch reads and stores data in HDFS by es-hadoop? Elasticsearch es-hadoop	6	2628	July 6, 2017
Elasticsearch with Hadoop HDFS Elasticsearch	3	497	July 6, 2017
How do I storage ES data into HDFS Elasticsearch	5	585	October 15, 2020
Can I store ES indices on HDFS only? Elasticsearch es-hadoop	4	912	July 6, 2017

HDFS storage options

Related topics