Failed to create shard, failure IOException[failed to obtain in-memory shard lock]

AaronSarkissian · July 23, 2020, 10:27am

My cluster setup is 2 nodes, both docker based on different VMs in the same network.

My cluster health becomes yellow after few hours it start to get unassigned status one by one, till after a day all the replica shards become unassigned and when I check the shard allocation it looks like this:

So call the following command: POST /_cluster/reroute?retry_failed=true

Immediately the shards are starting to initialize:

After like 3-4 minutes, it looks like all assigned, and the cluster health is green:

So, I started using the allocation/explain API: GET /_cluster/allocation/explain?pretty
And I got:

    {
      "index" : "projects",
      "shard" : 4,
      "primary" : false,
      "current_state" : "unassigned",
      "unassigned_info" : {
        "reason" : "MANUAL_ALLOCATION",
        "at" : "2020-07-21T08:22:48.307Z",
        "details" : "failed shard on node [Vnl1IdQOTdGDZcr0qG1Wxw]: failed to create shard, failure IOException[failed to obtain in-memory shard lock]; nested: ShardLockObtainFailedException[[projects][4]: obtaining shard lock timed out after 5000ms, previous lock details: [shard creation] trying to lock for [shard creation]]; ",
        "last_allocation_status" : "no_attempt"
      },
      "can_allocate" : "awaiting_info",
      "allocate_explanation" : "cannot allocate because information about existing shard data is still being retrieved from some of the nodes",
      "node_allocation_decisions" : [
        {
          "node_id" : "Vnl1IdQOTdGDZcr0qG1Wxw",
          "node_name" : "eu01",
          "transport_address" : "172.18.4.6:9300",
          "node_decision" : "throttled",
          "deciders" : [
            {
              "decider" : "throttling",
              "decision" : "THROTTLE",
              "explanation" : "reached the limit of incoming shard recoveries [2], cluster setting [cluster.routing.allocation.node_concurrent_incoming_recoveries=2] (can also be set via [cluster.routing.allocation.node_concurrent_recoveries])"
            }
          ]
        },
        {
          "node_id" : "gI3ylY0JTNWuCSOSJ1vN2g",
          "node_name" : "us01",
          "transport_address" : "172.18.1.11:9300",
          "node_decision" : "no",
          "deciders" : [
            {
              "decider" : "same_shard",
              "decision" : "NO",
              "explanation" : "a copy of this shard is already allocated to this node [[projects][4], node[gI3ylY0JTNWuCSOSJ1vN2g], [P], s[STARTED], a[id=X-D0rlNmRmuTSkWlR3AQ7w]]"
            },
            {
              "decider" : "throttling",
              "decision" : "THROTTLE",
              "explanation" : "reached the limit of outgoing shard recoveries [2] on the node [gI3ylY0JTNWuCSOSJ1vN2g] which holds the primary, cluster setting [cluster.routing.allocation.node_concurrent_outgoing_recoveries=2] (can also be set via [cluster.routing.allocation.node_concurrent_recoveries])"
            }
          ]
        }
      ]
    }

I checked my disk space, it's 90% free, so that is the case here.

Can someone help me understand what is the issue here and why the shards are getting unassigned every day?

Thanks

Christian_Dahlqvist · July 23, 2020, 10:29am

What type of storage are you using?

AaronSarkissian · July 23, 2020, 10:31am

Premium SSDs

Christian_Dahlqvist · July 23, 2020, 10:57am

Where is this hosted? Could there be connectivity issues?

AaronSarkissian · July 23, 2020, 10:59am

Azure memory optimized VMs. The network should be stable I think.

DavidTurner · July 23, 2020, 11:05am

Your nodes are called us01 and eu01. Are they respectively in the US and the EU?

AaronSarkissian · July 23, 2020, 11:07am

Yes, exactly. The ping between them is around 80ms

DavidTurner · July 23, 2020, 11:14am

Ok seems like the expected behaviour then, transatlantic networking isn't nearly fast or reliable enough for this.

AaronSarkissian · July 23, 2020, 11:18am

Hmm. Can it be because of the following settings in the jvm.options config?

## DNS cache policy

# cache ttl in seconds for positive DNS lookups noting that this overrides the



# JDK security property networkaddress.cache.ttl; set to -1 to cache forever

-Des.networkaddress.cache.ttl=60

# cache ttl in seconds for negative DNS lookups noting that this overrides the

# JDK security property networkaddress.cache.negative ttl; set to -1 to cache

# forever

-Des.networkaddress.cache.negative.ttl=10

And the full file is here:

## JVM configuration

################################################################
## IMPORTANT: JVM heap size
################################################################
##
## You should always set the min and max JVM heap
## size to the same value. For example, to set
## the heap to 4 GB, set:
##
## -Xms4g
## -Xmx4g
##
## See https://www.elastic.co/guide/en/elasticsearch/reference/current/heap-size.html
## for more information
##
################################################################

# Xms represents the initial size of total heap space
# Xmx represents the maximum size of total heap space

-Xms8g
-Xmx8g

################################################################
## Expert settings
################################################################
##
## All settings below this section are considered
## expert settings. Don't tamper with them unless
## you understand what you are doing
##
################################################################

## GC configuration
-XX:+UseConcMarkSweepGC
-XX:CMSInitiatingOccupancyFraction=75
-XX:+UseCMSInitiatingOccupancyOnly

## G1GC Configuration
# NOTE: G1GC is only supported on JDK version 10 or later.
# To use G1GC uncomment the lines below.
# 10-:-XX:-UseConcMarkSweepGC
# 10-:-XX:-UseCMSInitiatingOccupancyOnly
# 10-:-XX:+UseG1GC
# 10-:-XX:InitiatingHeapOccupancyPercent=75

## DNS cache policy
# cache ttl in seconds for positive DNS lookups noting that this overrides the
# JDK security property networkaddress.cache.ttl; set to -1 to cache forever
-Des.networkaddress.cache.ttl=60
# cache ttl in seconds for negative DNS lookups noting that this overrides the
# JDK security property networkaddress.cache.negative ttl; set to -1 to cache
# forever
-Des.networkaddress.cache.negative.ttl=10

## optimizations

# pre-touch memory pages used by the JVM during initialization
-XX:+AlwaysPreTouch

## basic

# explicitly set the stack size
-Xss1m

# set to headless, just in case
-Djava.awt.headless=true

# ensure UTF-8 encoding by default (e.g. filenames)
-Dfile.encoding=UTF-8

# use our provided JNA always versus the system one
-Djna.nosys=true

# turn off a JDK optimization that throws away stack traces for common
# exceptions because stack traces are important for debugging
-XX:-OmitStackTraceInFastThrow

# flags to configure Netty
-Dio.netty.noUnsafe=true
-Dio.netty.noKeySetOptimization=true
-Dio.netty.recycler.maxCapacityPerThread=0

# log4j 2
-Dlog4j.shutdownHookEnabled=false
-Dlog4j2.disable.jmx=true

-Djava.io.tmpdir=${ES_TMPDIR}

## heap dumps

# generate a heap dump when an allocation from the Java heap fails
# heap dumps are created in the working directory of the JVM
-XX:+HeapDumpOnOutOfMemoryError

# specify an alternative path for heap dumps; ensure the directory exists and
# has sufficient space
-XX:HeapDumpPath=data

# specify an alternative path for JVM fatal error logs
-XX:ErrorFile=logs/hs_err_pid%p.log

## JDK 8 GC logging

8:-XX:+PrintGCDetails
8:-XX:+PrintGCDateStamps
8:-XX:+PrintTenuringDistribution
8:-XX:+PrintGCApplicationStoppedTime
8:-Xloggc:logs/gc.log
8:-XX:+UseGCLogFileRotation
8:-XX:NumberOfGCLogFiles=32
8:-XX:GCLogFileSize=64m

# JDK 9+ GC logging
9-:-Xlog:gc*,gc+age=trace,safepoint:file=logs/gc.log:utctime,pid,tags:filecount=32,filesize=64m
# due to internationalization enhancements in JDK 9 Elasticsearch need to set the provider to COMPAT otherwise
# time/date parsing will break in an incompatible way for some date patterns and locals
9-:-Djava.locale.providers=COMPAT

DavidTurner · July 23, 2020, 11:26am

No, I don't see how adjusting the DNS caching config (or indeed any other settings) can change the fact that transatlantic networking isn't nearly fast or reliable enough for this. Clusters should be contained in a single datacenter, maybe with remote clusters elsewhere in the world.

AaronSarkissian · July 23, 2020, 11:31am

Ok, I'll try that see if it helps. But meanwhile I wonder why it happens only to the index with more than one shards. Because the devicelocations index never gets an unassigned status.

system · August 20, 2020, 11:32am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Red Cluster State: failed to create shard, failure IOException[failed to obtain in-memory shard lock] Elasticsearch	1	560	September 15, 2020
Unassigned shards Elasticsearch	3	467	July 6, 2017
Unassigned shards found Elasticsearch	2	5258	October 18, 2017
Allocation Error Elasticsearch	4	13080	June 19, 2017
Failed to create shard, failure IOException[failed to obtain in-memory shard lock] Elasticsearch	5	10432	July 12, 2020

Failed to create shard, failure IOException[failed to obtain in-memory shard lock]

Related topics