Elasticsearch with Amazon Elastic File System

ivten · July 19, 2016, 11:18am

Hi,

I have a single Elasticsearch node on Amazon EC2 instance. I've also mounted AWS Elastic File System to the instance under /usr/share/elasticsearch. When I start elasticsearch it starts normally and I can see that it creates files on the mounted EFS, but when I try to index a document I get this error:

{ "error" : { "root_cause" : [ { "type" : "unavailable_shards_exception", "reason" : "[customer][3] primary shard is not active Timeout: [1m], request: [index {[customer][external][1], source[\n{\n \"name\": \"John Doe\"\n}]}]" } ], "type" : "unavailable_shards_exception", "reason" : "[customer][3] primary shard is not active Timeout: [1m], request: [index {[customer][external][1], source[\n{\n \"name\": \"John Doe\"\n}]}]" }, "status" : 503 }

When I look at the elasticsearch output I see this error:

[2016-07-19 10:19:34,241][INFO ][cluster.metadata ] [Zarek] [customer] creating index, cause [auto(index api)], templates [], shards [5]/[1], mappings [external] [2016-07-19 10:19:34,348][WARN ][indices.cluster ] [Zarek] [[customer][1]] marking and sending shard failed due to [failed to create shard] java.lang.NullPointerException at org.elasticsearch.index.shard.ShardPath.selectNewPathForShard(ShardPath.java:241) at org.elasticsearch.index.IndexService.createShard(IndexService.java:336) at org.elasticsearch.indices.cluster.IndicesClusterStateService.applyInitializingShard(IndicesClusterStateService.java:601) at org.elasticsearch.indices.cluster.IndicesClusterStateService.applyNewOrUpdatedShards(IndicesClusterStateService.java:501) at org.elasticsearch.indices.cluster.IndicesClusterStateService.clusterChanged(IndicesClusterStateService.java:166) at org.elasticsearch.cluster.service.InternalClusterService.runTasksForExecutor(InternalClusterService.java:610) at org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:772) at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:231) at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:194) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) [2016-07-19 10:19:34,351][WARN ][cluster.action.shard ] [Zarek] [customer][1] received shard failed for target shard [[customer][1], node[35FNNM0hSsCWOyRQZPsz8A], [P], v[1], s[INITIALIZING], a[id=piu83IRBQNCpi2OLm4aGrw], unassigned_info[[reason=INDEX_CREATED], at[2016-07-19T10:19:34.252Z]]], indexUUID [ii2mnw8tQaCrfQ27Z5tkvA], message [failed to create shard], failure [NullPointerException[null]] java.lang.NullPointerException ... [2016-07-19 10:19:34,364][WARN ][indices.cluster ] [Zarek] [[customer][3]] marking and sending shard failed due to [failed to create shard] java.lang.NullPointerException

The cluster health shows this:

{ "cluster_name" : "elasticsearch", "status" : "red", "timed_out" : false, "number_of_nodes" : 1, "number_of_data_nodes" : 1, "active_primary_shards" : 0, "active_shards" : 0, "relocating_shards" : 0, "initializing_shards" : 0, "unassigned_shards" : 10, "delayed_unassigned_shards" : 0, "number_of_pending_tasks" : 0, "number_of_in_flight_fetch" : 0, "task_max_waiting_in_queue_millis" : 0, "active_shards_percent_as_number" : 0.0 }

Clearly elasticsearch cannot create shards on the Elastic File System. However at the same time it is able to use it as it stores a bunch of files and folders on it (like /usr/share/elasticsearch/nodes/0/indices and /usr/share/elasticsearch/nodes/0/_state).

I added rwx permissions to everyone under the mounted /usr/share/elasticsearch, but it doesn't help.

Selinux is also disabled.

Forcing shards reallocation doesn't help either.

Am I missing something in regard to Elasticsearch and NFS permissions?

dadoonet · July 19, 2016, 11:40am

Don't use elasticsearch with NFS.

ivten · July 21, 2016, 2:36pm

I am still testing elasticsearch, so performance isn't important for me at this stage.
However I'm interested in resolving the error that I got, because it doesn't make any sense: elasticsearch writes files on the storage, but cannot save shards.

mikemccand · July 21, 2016, 3:51pm

Hmm, is this ES 2.3? That line is here:

github.com

elastic/elasticsearch/blob/2.3/core/src/main/java/org/elasticsearch/index/shard/ShardPath.java#L241


            Integer count = dataPathToShardCount.get(nodePath.path);
            if (count != null) {
                usableBytes -= estShardSizeInBytes * count;
            }
            if (usableBytes > maxUsableBytes) {
                maxUsableBytes = usableBytes;
                bestPath = nodePath;
            }
        }


        statePath = bestPath.resolve(shardId);
        dataPath = statePath;
    }


    final String indexUUID = indexSettings.get(IndexMetaData.SETTING_INDEX_UUID, IndexMetaData.INDEX_UUID_NA_VALUE);


    return new ShardPath(NodeEnvironment.hasCustomDataPath(indexSettings), dataPath, statePath, indexUUID, shardId);
}


@Override
public boolean equals(Object o) {

So it seems bestPath is null, but it seems like that should only happen if env.nodePaths() returned an empty array, but then when I look at NodeEnvironment.java and Environtment.java I don't see how an empty NodePath array can be set on NodeEnvironment... odd.

Mike

ivten · July 21, 2016, 7:40pm

Yes, it's the latest elasticsearch.
I thought at first that it had something to do with ES not being able to acquire lock on the files on the AWS Elastic Files system. However the AWS documentation says that EFS does support file locks.

So I guess there is some kind of NFS particularity that is bugging things. I also tried mounting the NFS with various options such as noatime,nolock,auto,sec=sys but without any luck.

When I point the data directory to the local file system it works just fine.

mikemccand · July 22, 2016, 12:08pm

OK I think I see one way that NPE could happen: if FileStore.getUsableSpace() returns Long.MIN_VALUE (which is odd!). I'll fix ShardPath to guard against this...

Mike

mikemccand · July 22, 2016, 7:06pm

I opened this PR: https://github.com/elastic/elasticsearch/pull/19554

mikemccand · July 23, 2016, 8:12pm

Hi @ivten, can you run df on this node and post the output here? We are trying to dig into the JDK sources to understand whether/how FileStore.getUsableSpace could return this. Thanks.

ivten · July 25, 2016, 7:56pm

Here it is:

[ec2-user@ip-172-21-2-107 ~]$ df Filesystem 1K-blocks Used Available Use% Mounted on /dev/xvda1 20509288 6005280 14403760 30% / devtmpfs 243056 60 242996 1% /dev tmpfs 251632 0 251632 0% /dev/shm eu-west-1a.fs-4514fa8c.efs.eu-west-1.amazonaws.com:/ 9007199254740992 0 9007199254740992 0% /usr/share/elasticsearch/data [ec2-user@ip-172-21-2-107 ~]$ df -h Filesystem Size Used Avail Use% Mounted on /dev/xvda1 20G 5.8G 14G 30% / devtmpfs 238M 60K 238M 1% /dev tmpfs 246M 0 246M 0% /dev/shm eu-west-1a.fs-4514fa8c.efs.eu-west-1.amazonaws.com:/ 8.0E 0 8.0E 0% /usr/share/elasticsearch/data

rmuir · July 25, 2016, 10:09pm

That is definitely the case. Thank you for confirming what is happening. I reported it to the openjdk lists here:

http://mail.openjdk.java.net/pipermail/nio-dev/2016-July/003784.html

For now, maybe we can treat the overflowed results as unsigned long in our code, but IMO it does not really solve the problem... let's see what they say.

gpiccinni · August 4, 2016, 8:03am

Hi I'm having the same issue. Using Oracle JDK doesn't solve the issue.

ZillaG · October 18, 2016, 2:14pm

I have the same issue using EFS and Oracle JDK8u101 and ES v2.4.0

ivten · October 18, 2016, 6:38pm

The bug is now fixed in Elasticsearch 2.4.1 https://github.com/elastic/elasticsearch/pull/20527
I tested it and can confirm that it works.

ZillaG · October 27, 2016, 8:53pm

+1 for ES v2.4.1 to fix this issue.

Topic		Replies	Views
Create mapping error！——primary shard is not active Timeout Elasticsearch	2	8307	July 7, 2017
Elasticsearch throws "primary shard is not active Timeout:" error Elasticsearch	2	721	November 16, 2017
Primary shard is not active Timeout: Elasticsearch	1	4507	May 9, 2017
Unavailable_shards_exception Elasticsearch elastic-stack-alerting	1	245	January 11, 2024
ES 2.4 unable to create new index Elasticsearch	2	978	February 27, 2018

Elasticsearch with Amazon Elastic File System

Related topics