The shard cannot be allocated to the same node on which a copy of the shard already exists

Suddenly the count operation on an index returned a few million less results than usual.
Looking at application logs I see stacktraces when trying to add a new document.


    File "/myapp/MetaControl/MetaController.py", line 442, in save_av_analysis
      self.av_db.index(index="av_index_two", doc_type="document", id=file_id, body={"av": unique_av_scans_vect}, request_timeout=3600)
    File "/usr/local/lib/python2.7/site-packages/elasticsearch/client/utils.py", line 76, in _wrapped
      return func(*args, params=params, **kwargs)
    File "/usr/local/lib/python2.7/site-packages/elasticsearch/client/__init__.py", line 319, in index
      _make_path(index, doc_type, id), params=params, body=body)
    File "/usr/local/lib/python2.7/site-packages/elasticsearch/transport.py", line 318, in perform_request
      status, headers_response, data = connection.perform_request(method, url, params, body, headers=headers, ignore=ignore, timeout=timeout)
    File "/usr/local/lib/python2.7/site-packages/elasticsearch/connection/http_urllib3.py", line 186, in perform_request
      self._raise_error(response.status, raw_data)
    File "/usr/local/lib/python2.7/site-packages/elasticsearch/connection/base.py", line 125, in _raise_error
      raise HTTP_EXCEPTIONS.get(status_code, TransportError)(status_code, error_message, additional_info)
    2020-05-18T14:15:12+00:00 codexserver.local worker_private_vt[956]: TransportError: TransportError(503, u'unavailable_shards_exception', u'[av_index_two][1] primary shard is not active Timeout: [1m], request: [BulkShardRequest [[av_index_two][1]] containing [index {[av_index_two][document][REDACTED], source[{"av":["REDACTED2 (score:9)"]}]}]]')

Looking at the elasticsearch logs:


    [2020-05-18T14:22:16,914][WARN ][r.suppressed             ] path: /av_index_two/document/e1ec61a272e7489dc80741efd5a8c5672bc34dc1, params: {index=av_index_two, id=e1ec61a272e7489dc80741efd5a8c5672bc34dc1, type=document}                                                                                                                                                                       
    org.elasticsearch.action.UnavailableShardsException: [av_index_two][1] primary shard is not active Timeout: [1m], request: [BulkShardRequest [[av_index_two][1]] containing [index {[av_index_two][document][REDACTED], source[{"av":["REDACTEDstrings"]}]}]]                                                                                                                           
            at org.elasticsearch.action.support.replication.TransportReplicationAction$ReroutePhase.retryBecauseUnavailable(TransportReplicationAction.java:862) [elasticsearch-5.3.2.jar:5.3.2]     
            at org.elasticsearch.action.support.replication.TransportReplicationAction$ReroutePhase.retryIfUnavailable(TransportReplicationAction.java:699) [elasticsearch-5.3.2.jar:5.3.2]          
            at org.elasticsearch.action.support.replication.TransportReplicationAction$ReroutePhase.doRun(TransportReplicationAction.java:653) [elasticsearch-5.3.2.jar:5.3.2]                       
            at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-5.3.2.jar:5.3.2]                                                               
            at org.elasticsearch.action.support.replication.TransportReplicationAction$ReroutePhase$2.onTimeout(TransportReplicationAction.java:816) [elasticsearch-5.3.2.jar:5.3.2]                 
            at org.elasticsearch.cluster.ClusterStateObserver$ContextPreservingListener.onTimeout(ClusterStateObserver.java:311) [elasticsearch-5.3.2.jar:5.3.2]                                     
            at org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener.onTimeout(ClusterStateObserver.java:238) [elasticsearch-5.3.2.jar:5.3.2]                                  
            at org.elasticsearch.cluster.service.ClusterService$NotifyTimeout.run(ClusterService.java:1162) [elasticsearch-5.3.2.jar:5.3.2]                                                          
            at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:569) [elasticsearch-5.3.2.jar:5.3.2]                                          
            at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_121]                                                                                         
            at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_121]                                                                                         
            at java.lang.Thread.run(Thread.java:745) [?:1.8.0_121]    

The explain allocation output

    curl http://127.0.0.1:9200/_cluster/allocation/explain?pretty
    {
      "index" : "av_index",
      "shard" : 2,
      "primary" : false,
      "current_state" : "unassigned",
      "unassigned_info" : {
        "reason" : "CLUSTER_RECOVERED",
        "at" : "2020-02-20T22:47:21.101Z",
        "last_allocation_status" : "no_attempt"
      },
      "can_allocate" : "no",
      "allocate_explanation" : "cannot allocate because allocation is not permitted to any of the nodes",
      "node_allocation_decisions" : [
        {
          "node_id" : "SaAmCeedRJCNPYkwpzQxog",
          "node_name" : "SaAmCee",
          "transport_address" : "127.0.0.1:9300",
          "node_decision" : "no",
          "deciders" : [
            {
              "decider" : "same_shard",
              "decision" : "NO",
              "explanation" : "the shard cannot be allocated to the same node on which a copy of the shard already exists [[av_index][2], node[SaAmCeedRJCNPYkwpzQxog], [P], s[STARTED], a[id=hvRyVAD9QfatQKqs6exQLg]]"
            }
          ]
        }
      ]
    }

The elastic container has been up for 2 months. I haven't restarted it because it takes around 30 minutes to start and I'm not sure it will be for worst.

`54ea98e8e860        elasticsearch:codex   "/docker-entrypoint.?"   8 months ago        Up 2 months         0.0.0.0:9200->9200/tcp, 0.0.0.0:9300->9300/tcp                     elasticdb_elasticsearch_1`

Although the node is somewhat big, there still space left on the drive.

    # du -h -d 2
    774G    ./nodes/0
    774G    ./nodes
    774G    .

drive is not full:

    # df -h | grep " /$"
    /dev/md0         15T   12T  2.0T  86% /

cluster health:

    # curl http://127.0.0.1:9200/_cluster/health?pretty
    {
      "cluster_name" : "elasticsearch",
      "status" : "red",
      "timed_out" : false,
      "number_of_nodes" : 1,
      "number_of_data_nodes" : 1,
      "active_primary_shards" : 10,
      "active_shards" : 10,
      "relocating_shards" : 0,
      "initializing_shards" : 0,
      "unassigned_shards" : 12,
      "delayed_unassigned_shards" : 0,
      "number_of_pending_tasks" : 0,
      "number_of_in_flight_fetch" : 0,
      "task_max_waiting_in_queue_millis" : 0,
      "active_shards_percent_as_number" : 45.45454545454545
    }

version:

    curl http://127.0.0.1:9200
    {
      "name" : "SaAmCee",
      "cluster_name" : "elasticsearch",
      "cluster_uuid" : "0CwCUa6BRlifbsQClTH51w",
      "version" : {
        "number" : "5.3.2",
        "build_hash" : "3068195",
        "build_date" : "2017-04-24T16:15:59.481Z",
        "build_snapshot" : false,
        "lucene_version" : "6.4.2"
      },
      "tagline" : "You Know, for Search"
    }

indices:

    # curl http://127.0.0.1:9200/_cat/indices?v
    health status index        uuid                   pri rep docs.count docs.deleted store.size pri.store.size
    yellow open   .kibana      pfJfAgBxTBGfzTLwyeS8eQ   1   1          1            0      3.1kb          3.1kb
    red    open   av_index_two VWaMCx4vRkqY0gBFunvaNw   5   1   31213524       710773    617.2gb        617.2gb
    yellow open   av_index     a7Id1EeuQ_-3IMvPG15UQA   5   1    1866655         1850      1.4gb          1.4gb

(There hasn't been any config changes in the last few hours when this started happening)

$ curl http://127.0.0.1:9200/_cat/shards?v
index        shard prirep state         docs   store ip        node
av_index     2     p      STARTED     372622 292.9mb 127.0.0.1 SaAmCee
av_index     2     r      UNASSIGNED                           
av_index     3     p      STARTED     373382 291.3mb 127.0.0.1 SaAmCee
av_index     3     r      UNASSIGNED                           
av_index     1     p      STARTED     373211   304mb 127.0.0.1 SaAmCee
av_index     1     r      UNASSIGNED                           
av_index     4     p      STARTED     373965 303.3mb 127.0.0.1 SaAmCee
av_index     4     r      UNASSIGNED                           
av_index     0     p      STARTED     373475 303.3mb 127.0.0.1 SaAmCee
av_index     0     r      UNASSIGNED                           
.kibana      0     p      STARTED          1   3.1kb 127.0.0.1 SaAmCee
.kibana      0     r      UNASSIGNED                           
av_index_two 2     p      STARTED    7801315   154gb 127.0.0.1 SaAmCee
av_index_two 2     r      UNASSIGNED                           
av_index_two 3     p      STARTED    7806033 154.4gb 127.0.0.1 SaAmCee
av_index_two 3     r      UNASSIGNED                           
av_index_two 4     p      STARTED    7804106 154.5gb 127.0.0.1 SaAmCee
av_index_two 4     r      UNASSIGNED                           
av_index_two 1     p      UNASSIGNED                           
av_index_two 1     r      UNASSIGNED                           
av_index_two 0     p      STARTED    7804052 154.2gb 127.0.0.1 SaAmCee
av_index_two 0     r      UNASSIGNED                           

java -version
openjdk version "1.8.0_121"
OpenJDK Runtime Environment (build 1.8.0_121-8u121-b13-1~bpo8+1-b13)
OpenJDK 64-Bit Server VM (build 25.121-b13, mixed mode)

I have found other topics with the same error without reply:

This is a replica, but you only have one node, so of course no replicas can be allocated. You should set number_of_replicas: 0 on all your indices.

Thanks! After setting number_of_replicas to 0 the yellow nodes went green but the red stayed red:

curl http://127.0.0.1:9200/_cat/indices?v
health status index        uuid                   pri rep docs.count docs.deleted store.size pri.store.size                                           
green  open   .kibana      pfJfAgBxTBGfzTLwyeS8eQ   1   0          1            0      3.1kb          3.1kb
red    open   av_index_two VWaMCx4vRkqY0gBFunvaNw   5   0   31246541       709402    617.6gb        617.6gb
green  open   av_index     a7Id1EeuQ_-3IMvPG15UQA   5   0    1866655         1850      1.4gb          1.4gb

Allocation explaination does not give me very good news:

# curl http://127.0.0.1:9200/_cluster/allocation/explain?pretty                                      
{             
  "index" : "av_index_two",                                                                                                                           
  "shard" : 1,
  "primary" : true,                                                                                                                                   
  "current_state" : "unassigned",
  "unassigned_info" : {                                                                                                                               
    "reason" : "ALLOCATION_FAILED",
    "at" : "2020-05-17T01:38:32.247Z",                                                                                                                
    "failed_allocation_attempts" : 1,
    "details" : "shard failure, reason [merge failed], failure MergeException[org.apache.lucene.index.CorruptIndexException: checksum failed (hardware
 problem?) : expected=55daf8c actual=42f93512 (resource=BufferedChecksumIndexInput(MMapIndexInput(path=\"/usr/share/elasticsearch/data/nodes/0/indices/VWaMCx4vRkqY0gBFunvaNw/1/index/_4hiha.cfs\") [slice=_4hiha_Lucene50_0.pos]))]; nested: CorruptIndexException[checksum failed (hardware problem?) : expected=55daf8c actual=42f93512 (resource=BufferedChecksumIndexInput(MMapIndexInput(path=\"/usr/share/elasticsearch/data/nodes/0/indices/VWaMCx4vRkqY0gBFunvaNw/1/index/_4hiha.cfs\") [slice=_4hiha_Lucene50_0.pos]))]; ",                                                                                   
    "last_allocation_status" : "no_valid_shard_copy"
  },                                                                                                                                                  
  "can_allocate" : "no_valid_shard_copy",
  "allocate_explanation" : "cannot allocate because all found copies of the shard are either stale or corrupt",                                       
  "node_allocation_decisions" : [
    {                                                                                                                                                 
      "node_id" : "SaAmCeedRJCNPYkwpzQxog",
      "node_name" : "SaAmCee",                                                                                                                        
      "transport_address" : "127.0.0.1:9300",
      "node_decision" : "no",                                                                                                                         
      "store" : {
        "in_sync" : true,                                                                                                                             
        "allocation_id" : "Izm6WM2LTHmDi0xtj7vS7Q",
        "store_exception" : {                                                                                                                         
          "type" : "corrupt_index_exception",
          "reason" : "failed engine (reason: [merge failed]) (resource=preexisting_corruption)",                                                      
          "caused_by" : {
            "type" : "i_o_exception",                                                                                                                 
            "reason" : "failed engine (reason: [merge failed])",
            "caused_by" : {                                                                                                                           
              "type" : "corrupt_index_exception",  
              "reason" : "checksum failed (hardware problem?) : expected=55daf8c actual=42f93512 (resource=BufferedChecksumIndexInput(MMapIndexInput(path=\"/usr/share/elasticsearch/data/nodes/0/indices/VWaMCx4vRkqY0gBFunvaNw/1/index/_4hiha.cfs\") [slice=_4hiha_Lucene50_0.pos]))"                     
            } 
          }                                                                                                                                           
        }     
      }                                                                                                                                               
    }         
  ]                                                                                                                                                   
}             

The server is on a RAID10 which seems healthy:

$ cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4] [linear] [multipath] [raid0] [raid1] [raid10]                                                                 
md0 : active raid5 sdb3[1] sde3[4] sda3[0] sdd3[3] sdc3[2]
      15596290048 blocks super 1.2 level 5, 512k chunk, algorithm 2 [5/5] [UUUUU]                                                                     
      bitmap: 8/30 pages [32KB], 65536KB chunk
                                                                                                                                                      
unused devices: <none>

The docs.count from the corrupted index still increases over time (around half of the operations succeed, half fail), so I'll try a forcing a RAID resync and see if it gets fixed.
Thanks

EDIT: oops. it is actually RAID5. anyway, thanks.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.