ElasticSearch fails on NFS, makes tons of empty directories in nodes


(Gregory Rice) #1

Hey guys,

I've changed data.path to an NFS share mounted on my single-node
elasticsearch machine. The monolithic elasticsearch has no problem
creating the
elasticsearch directory, or the nodes directory, but while it has no
trouble making a node directory and filling it with data on local
drives,
on my NFS share it simply creates a bunch of directories in the
/nfs_share/data/elasticsearch/nodes directory at a rate about one
every 30
seconds until after a while, there's like fifty nodes listed in the
directory, each completely empty except for a node.lock file.

Obviously it's able to create the node.lock file and the directory
structure, so it's not a permissions issue, but is there a particular
reason something like this would happen on NFS when there's no problem
getting data onto a local disk, with only one directory created under
"nodes"?

Anyone seen anything like this or know what I can do to help figure
out
what's going on?

Thanks,
Greg


(Radu Gheorghe) #2

Hi Greg,

In Elasticsearch, do you have the Gateway left to the default "local",
or did you change it to "fs"?

We're having two nodes here that write on an NFS share with the
Gateway set to "local", and we had a lot of strange permission issues.
What works now is that we have all_squash set and the data directory
belongs to nobody:nogroup.

Hope this helps,
Radu


(Gregory Rice) #3

Radu,

I tried changing the gateway to "fs". I tried moving the gateway
location to the share.

What happens is, when I set the data.path directory to a directory on
the fileshare, elasticsearch doesn't even really start up. It gets as
far as:

[2012-02-22 20:06:12,212][INFO ][node ] [Ziggy
Pig] {0.18.7}[3907]: initializing ...
[2012-02-22 20:06:12,218][INFO ][plugins ] [Ziggy
Pig] loaded [], sites []

And hangs.

Changing the gateway type or gateway location did nothing. Remember,
it doesn't look like permissions, as ElasticSearch has no problem
creating node/0 node/1 node/2 node/3 ENDLESSLY on the NFS drive, so
obviously it's able to write. It simply cannot initiate the node, so
it tries creating another one.

Any other ideas? It's not able to create anything but endless new
nodes (as it never is able to initialize one successfully, I guess)
and then puts a node.lock file in them and moves on and makes another.

Thanks,
Greg Rice

On Feb 21, 10:55 pm, Radu Gheorghe radu0gheor...@gmail.com wrote:

Hi Greg,

In Elasticsearch, do you have the Gateway left to the default "local",
or did you change it to "fs"?

We're having two nodes here that write on an NFS share with the
Gateway set to "local", and we had a lot of strange permission issues.
What works now is that we have all_squash set and the data directory
belongs to nobody:nogroup.

Hope this helps,
Radu


(Radu Gheorghe) #4

Nope, I don't have any other ideas. I would just try with all_squash
and chown -R nobody:nogroup on the data directory. And with the
default "local" gateway.


(Shay Banon) #5

It sounds like it failed to acquire an exclusive lock on the node.lock file (which is needs to do). We can try and see why the problem happens, but why use NFS to store the data?

On Wednesday, February 22, 2012 at 2:15 AM, Gregory Rice wrote:

Hey guys,

I've changed data.path to an NFS share mounted on my single-node
elasticsearch machine. The monolithic elasticsearch has no problem
creating the
elasticsearch directory, or the nodes directory, but while it has no
trouble making a node directory and filling it with data on local
drives,
on my NFS share it simply creates a bunch of directories in the
/nfs_share/data/elasticsearch/nodes directory at a rate about one
every 30
seconds until after a while, there's like fifty nodes listed in the
directory, each completely empty except for a node.lock file.

Obviously it's able to create the node.lock file and the directory
structure, so it's not a permissions issue, but is there a particular
reason something like this would happen on NFS when there's no problem
getting data onto a local disk, with only one directory created under
"nodes"?

Anyone seen anything like this or know what I can do to help figure
out
what's going on?

Thanks,
Greg


(system) #6