Elasticsearch with Amazon Elastic File System


#1

Hi,

I have a single Elasticsearch node on Amazon EC2 instance. I've also mounted AWS Elastic File System to the instance under /usr/share/elasticsearch. When I start elasticsearch it starts normally and I can see that it creates files on the mounted EFS, but when I try to index a document I get this error:

{ "error" : { "root_cause" : [ { "type" : "unavailable_shards_exception", "reason" : "[customer][3] primary shard is not active Timeout: [1m], request: [index {[customer][external][1], source[\n{\n \"name\": \"John Doe\"\n}]}]" } ], "type" : "unavailable_shards_exception", "reason" : "[customer][3] primary shard is not active Timeout: [1m], request: [index {[customer][external][1], source[\n{\n \"name\": \"John Doe\"\n}]}]" }, "status" : 503 }

When I look at the elasticsearch output I see this error:

[2016-07-19 10:19:34,241][INFO ][cluster.metadata ] [Zarek] [customer] creating index, cause [auto(index api)], templates [], shards [5]/[1], mappings [external] [2016-07-19 10:19:34,348][WARN ][indices.cluster ] [Zarek] [[customer][1]] marking and sending shard failed due to [failed to create shard] java.lang.NullPointerException at org.elasticsearch.index.shard.ShardPath.selectNewPathForShard(ShardPath.java:241) at org.elasticsearch.index.IndexService.createShard(IndexService.java:336) at org.elasticsearch.indices.cluster.IndicesClusterStateService.applyInitializingShard(IndicesClusterStateService.java:601) at org.elasticsearch.indices.cluster.IndicesClusterStateService.applyNewOrUpdatedShards(IndicesClusterStateService.java:501) at org.elasticsearch.indices.cluster.IndicesClusterStateService.clusterChanged(IndicesClusterStateService.java:166) at org.elasticsearch.cluster.service.InternalClusterService.runTasksForExecutor(InternalClusterService.java:610) at org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:772) at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:231) at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:194) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) [2016-07-19 10:19:34,351][WARN ][cluster.action.shard ] [Zarek] [customer][1] received shard failed for target shard [[customer][1], node[35FNNM0hSsCWOyRQZPsz8A], [P], v[1], s[INITIALIZING], a[id=piu83IRBQNCpi2OLm4aGrw], unassigned_info[[reason=INDEX_CREATED], at[2016-07-19T10:19:34.252Z]]], indexUUID [ii2mnw8tQaCrfQ27Z5tkvA], message [failed to create shard], failure [NullPointerException[null]] java.lang.NullPointerException ... [2016-07-19 10:19:34,364][WARN ][indices.cluster ] [Zarek] [[customer][3]] marking and sending shard failed due to [failed to create shard] java.lang.NullPointerException

The cluster health shows this:

{ "cluster_name" : "elasticsearch", "status" : "red", "timed_out" : false, "number_of_nodes" : 1, "number_of_data_nodes" : 1, "active_primary_shards" : 0, "active_shards" : 0, "relocating_shards" : 0, "initializing_shards" : 0, "unassigned_shards" : 10, "delayed_unassigned_shards" : 0, "number_of_pending_tasks" : 0, "number_of_in_flight_fetch" : 0, "task_max_waiting_in_queue_millis" : 0, "active_shards_percent_as_number" : 0.0 }

Clearly elasticsearch cannot create shards on the Elastic File System. However at the same time it is able to use it as it stores a bunch of files and folders on it (like /usr/share/elasticsearch/nodes/0/indices and /usr/share/elasticsearch/nodes/0/_state).

I added rwx permissions to everyone under the mounted /usr/share/elasticsearch, but it doesn't help.

Selinux is also disabled.

Forcing shards reallocation doesn't help either.

Am I missing something in regard to Elasticsearch and NFS permissions?


FS is not full, but ES reports: java.io.IOException: Disk quota exceeded
(David Pilato) #2

Don't use elasticsearch with NFS.


#3

I am still testing elasticsearch, so performance isn't important for me at this stage.
However I'm interested in resolving the error that I got, because it doesn't make any sense: elasticsearch writes files on the storage, but cannot save shards.


(Michael McCandless) #4

Hmm, is this ES 2.3? That line is here:

So it seems bestPath is null, but it seems like that should only happen if env.nodePaths() returned an empty array, but then when I look at NodeEnvironment.java and Environtment.java I don't see how an empty NodePath array can be set on NodeEnvironment... odd.

Mike


#5

Yes, it's the latest elasticsearch.
I thought at first that it had something to do with ES not being able to acquire lock on the files on the AWS Elastic Files system. However the AWS documentation says that EFS does support file locks.

So I guess there is some kind of NFS particularity that is bugging things. I also tried mounting the NFS with various options such as noatime,nolock,auto,sec=sys but without any luck.

When I point the data directory to the local file system it works just fine.


(Michael McCandless) #6

OK I think I see one way that NPE could happen: if FileStore.getUsableSpace() returns Long.MIN_VALUE (which is odd!). I'll fix ShardPath to guard against this...

Mike


(Michael McCandless) #7

I opened this PR: https://github.com/elastic/elasticsearch/pull/19554


(Michael McCandless) #8

Hi @ivten, can you run df on this node and post the output here? We are trying to dig into the JDK sources to understand whether/how FileStore.getUsableSpace could return this. Thanks.


#9

Here it is:

[ec2-user@ip-172-21-2-107 ~]$ df Filesystem 1K-blocks Used Available Use% Mounted on /dev/xvda1 20509288 6005280 14403760 30% / devtmpfs 243056 60 242996 1% /dev tmpfs 251632 0 251632 0% /dev/shm eu-west-1a.fs-4514fa8c.efs.eu-west-1.amazonaws.com:/ 9007199254740992 0 9007199254740992 0% /usr/share/elasticsearch/data [ec2-user@ip-172-21-2-107 ~]$ df -h Filesystem Size Used Avail Use% Mounted on /dev/xvda1 20G 5.8G 14G 30% / devtmpfs 238M 60K 238M 1% /dev tmpfs 246M 0 246M 0% /dev/shm eu-west-1a.fs-4514fa8c.efs.eu-west-1.amazonaws.com:/ 8.0E 0 8.0E 0% /usr/share/elasticsearch/data


Nexus 3 on docker on nfs4 (efs) reports highwatermark
#10

That is definitely the case. Thank you for confirming what is happening. I reported it to the openjdk lists here:

http://mail.openjdk.java.net/pipermail/nio-dev/2016-July/003784.html

For now, maybe we can treat the overflowed results as unsigned long in our code, but IMO it does not really solve the problem... let's see what they say.


(Gpiccinni) #11

Hi I'm having the same issue. Using Oracle JDK doesn't solve the issue.


(ZillaG) #12

I have the same issue using EFS and Oracle JDK8u101 and ES v2.4.0


#13

The bug is now fixed in Elasticsearch 2.4.1 https://github.com/elastic/elasticsearch/pull/20527
I tested it and can confirm that it works.


(ZillaG) #14

+1 for ES v2.4.1 to fix this issue.


(system) #15