NFS - Gateway FS index storage

I need help configuring an elastic search cluster. I would like to have the
index shared between to nodes via an NFS mount.

I have set the gateway.fs.location property to the NFS destination and I
see a directory is created for the cluster with indices and metadata as
subfolders. This looks okay.

However, it appears the directory specified in the path.data property also
contains an index. I didn't expect the path.data directory to also contain
the index. Why are there two copies of the index?

--

BryanB wrote:

I need help configuring an Elasticsearch cluster. I would like to
have the index shared between to nodes via an NFS mount.

I have set the gateway.fs.location property to the NFS destination
and I see a directory is created for the cluster with indices and
metadata as subfolders. This looks okay.

However, it appears the directory specified in the path.data
property also contains an index. I didn't expect the path.data
directory to also contain the index. Why are there two copies of
the index?

Gateways are a way for ES to asynchronously snapshot cluster data to
a central location for easy restoration. But as long as you're using
filesystem-based index storage (as opposed to RAM-based), path.data
will always contain your index data too.

If you're starting more than one node in a particular path.data, ES
will create more node-level subdirectories so they don't conflict,
e.g., 0/, 1/, etc. I know this works with multiple nodes running on
the same machine, but I haven't tried it with different machines
accessing the same volume. In theory you should be able to stick
with local gateway and point path.data at the NFS mount and keep only
one copy of your data.

Unfortunately, this also comes with a cost of now not only having to
hit the disk, but do it over the network with other machines
contending for the same resource.

-Drew

--

Hi Bryan,

Why do you want to share indexes with NFS?
Do you think that you must do it or is it a disk space concern?
Beware of IO when storing indexes on a network FS.

That said, documentation says that SharedFS is used to store ES metadata (replace path.work)
That means that your index will always be written in path.data.

See: http://www.elasticsearch.org/guide/reference/modules/gateway/fs.html

My 2 cents

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 7 nov. 2012 à 04:32, BryanB bryan.bringle@gmail.com a écrit :

I need help configuring an elastic search cluster. I would like to have the index shared between to nodes via an NFS mount.

I have set the gateway.fs.location property to the NFS destination and I see a directory is created for the cluster with indices and metadata as subfolders. This looks okay.

However, it appears the directory specified in the path.data property also contains an index. I didn't expect the path.data directory to also contain the index. Why are there two copies of the index?

--

The NFS was a constraint added by our admins. That constraint has been
removed and local storage has been allocated on each node for index
storage. I still have the NFS mount available. Should I use the NFS as a
gateway FS? Our ES deployment is relatively simple. All of our indexed data
is backed by a database. We have a job to rebuild and swap the index using
aliasing is something goes wrong.

On Wednesday, November 7, 2012 2:34:46 AM UTC-5, Drew Raines wrote:

BryanB wrote:

I need help configuring an Elasticsearch cluster. I would like to
have the index shared between to nodes via an NFS mount.

I have set the gateway.fs.location property to the NFS destination
and I see a directory is created for the cluster with indices and
metadata as subfolders. This looks okay.

However, it appears the directory specified in the path.data
property also contains an index. I didn't expect the path.data
directory to also contain the index. Why are there two copies of
the index?

Gateways are a way for ES to asynchronously snapshot cluster data to
a central location for easy restoration. But as long as you're using
filesystem-based index storage (as opposed to RAM-based), path.data
will always contain your index data too.

If you're starting more than one node in a particular path.data, ES
will create more node-level subdirectories so they don't conflict,
e.g., 0/, 1/, etc. I know this works with multiple nodes running on
the same machine, but I haven't tried it with different machines
accessing the same volume. In theory you should be able to stick
with local gateway and point path.data at the NFS mount and keep only
one copy of your data.

Unfortunately, this also comes with a cost of now not only having to
hit the disk, but do it over the network with other machines
contending for the same resource.

-Drew

--

Hello Bryan,

On Thu, Nov 8, 2012 at 3:51 PM, BryanB bryan.bringle@gmail.com wrote:

The NFS was a constraint added by our admins. That constraint has been
removed and local storage has been allocated on each node for index storage.
I still have the NFS mount available. Should I use the NFS as a gateway FS?

I would only use NFS gateway in your situation if the local storage is
so unreliable that replicas wouldn't be enough to ensure I don't lose
data. Local Gateway is usually the way to go, not only for
performance, but also because it's what most users have, so it's the
more proven (see "tested" :p) way.

Best regards,
Radu

http://sematext.com/ -- Elasticsearch -- Solr -- Lucene

--