NFS - Gateway FS index storage

bryanb · November 7, 2012, 3:32am

I need help configuring an elastic search cluster. I would like to have the
index shared between to nodes via an NFS mount.

I have set the gateway.fs.location property to the NFS destination and I
see a directory is created for the cluster with indices and metadata as
subfolders. This looks okay.

However, it appears the directory specified in the path.data property also
contains an index. I didn't expect the path.data directory to also contain
the index. Why are there two copies of the index?

--

drewr · November 7, 2012, 7:18am

BryanB wrote:

I need help configuring an Elasticsearch cluster. I would like to
have the index shared between to nodes via an NFS mount.

I have set the gateway.fs.location property to the NFS destination
and I see a directory is created for the cluster with indices and
metadata as subfolders. This looks okay.

However, it appears the directory specified in the path.data
property also contains an index. I didn't expect the path.data
directory to also contain the index. Why are there two copies of
the index?

Gateways are a way for ES to asynchronously snapshot cluster data to
a central location for easy restoration. But as long as you're using
filesystem-based index storage (as opposed to RAM-based), path.data
will always contain your index data too.

If you're starting more than one node in a particular path.data, ES
will create more node-level subdirectories so they don't conflict,
e.g., 0/, 1/, etc. I know this works with multiple nodes running on
the same machine, but I haven't tried it with different machines
accessing the same volume. In theory you should be able to stick
with local gateway and point path.data at the NFS mount and keep only
one copy of your data.

Unfortunately, this also comes with a cost of now not only having to
hit the disk, but do it over the network with other machines
contending for the same resource.

-Drew

--

dadoonet · November 7, 2012, 7:22am

Hi Bryan,

Why do you want to share indexes with NFS?
Do you think that you must do it or is it a disk space concern?
Beware of IO when storing indexes on a network FS.

That said, documentation says that SharedFS is used to store ES metadata (replace path.work)
That means that your index will always be written in path.data.

See: http://www.elasticsearch.org/guide/reference/modules/gateway/fs.html

My 2 cents

--
David
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 7 nov. 2012 à 04:32, BryanB bryan.bringle@gmail.com a écrit :

I need help configuring an elastic search cluster. I would like to have the index shared between to nodes via an NFS mount.

I have set the gateway.fs.location property to the NFS destination and I see a directory is created for the cluster with indices and metadata as subfolders. This looks okay.

However, it appears the directory specified in the path.data property also contains an index. I didn't expect the path.data directory to also contain the index. Why are there two copies of the index?

--

bryanb · November 8, 2012, 1:51pm

The NFS was a constraint added by our admins. That constraint has been
removed and local storage has been allocated on each node for index
storage. I still have the NFS mount available. Should I use the NFS as a
gateway FS? Our ES deployment is relatively simple. All of our indexed data
is backed by a database. We have a job to rebuild and swap the index using
aliasing is something goes wrong.

On Wednesday, November 7, 2012 2:34:46 AM UTC-5, Drew Raines wrote:

BryanB wrote:

I need help configuring an Elasticsearch cluster. I would like to
have the index shared between to nodes via an NFS mount.

I have set the gateway.fs.location property to the NFS destination
and I see a directory is created for the cluster with indices and
metadata as subfolders. This looks okay.

However, it appears the directory specified in the path.data
property also contains an index. I didn't expect the path.data
directory to also contain the index. Why are there two copies of
the index?

Gateways are a way for ES to asynchronously snapshot cluster data to
a central location for easy restoration. But as long as you're using
filesystem-based index storage (as opposed to RAM-based), path.data
will always contain your index data too.

If you're starting more than one node in a particular path.data, ES
will create more node-level subdirectories so they don't conflict,
e.g., 0/, 1/, etc. I know this works with multiple nodes running on
the same machine, but I haven't tried it with different machines
accessing the same volume. In theory you should be able to stick
with local gateway and point path.data at the NFS mount and keep only
one copy of your data.

Unfortunately, this also comes with a cost of now not only having to
hit the disk, but do it over the network with other machines
contending for the same resource.

-Drew

--

radu_gheorghe · November 9, 2012, 10:52am

Hello Bryan,

On Thu, Nov 8, 2012 at 3:51 PM, BryanB bryan.bringle@gmail.com wrote:

The NFS was a constraint added by our admins. That constraint has been
removed and local storage has been allocated on each node for index storage.
I still have the NFS mount available. Should I use the NFS as a gateway FS?

I would only use NFS gateway in your situation if the local storage is
so unreliable that replicas wouldn't be enough to ensure I don't lose
data. Local Gateway is usually the way to go, not only for
performance, but also because it's what most users have, so it's the
more proven (see "tested" :p) way.

Best regards,
Radu

http://sematext.com/ -- Elasticsearch -- Solr -- Lucene

--

Topic		Replies	Views
Storage Shared or local Elasticsearch	4	555	August 27, 2019
Lost of index upon cluster restart Elasticsearch	9	2338	July 6, 2017
Path.data and path.repo difference Elasticsearch	5	10671	May 13, 2020
Single NFS Storage for Entire Cluster - Separate processing and data replication Elasticsearch	2	4306	July 6, 2017
Elasticsearch path.data Elasticsearch	2	610	September 20, 2017

NFS - Gateway FS index storage

However, it appears the directory specified in the path.data property also contains an index. I didn't expect the path.data directory to also contain the index. Why are there two copies of the index?

Best regards, Radu

Related topics

Best regards,
Radu