Why is synced flush failing for this index?

Elasticsearch 6.8.6. The index has ten primary shards and replicas is set to 1. Each shards is ~30GB. The index (and cluster) health is Green. No data is being written to the index. A flushed sync consistently fails on all it's shards with the same reason. Can anyone explain what the reason given below means? It makes no sense to me and I can't find anything which explains it.

$ curl  -s -u username -XPOST "https://$(hostname):9200/lots_of_data_index/_flush/synced?pretty" 
{
  "_shards" : {
    "total" : 20,
    "successful" : 0,
    "failed" : 20
  },
  "lots_of_data_index" : {
    "total" : 20,
    "successful" : 0,
    "failed" : 20,
    "failures" : [
      {
        "shard" : 1,
        "reason" : "[node_one][10.70.13.155:9300][internal:indices/flush/synced/sync]",
        "routing" : {
          "state" : "STARTED",
          "primary" : false,
          "node" : "ui6ied_6Tx2BO1pmyQM7gw",
          "relocating_node" : null,
          "shard" : 1,
          "index" : "lots_of_data_index",
          "allocation_id" : {
            "id" : "t4OjXQtDQBedLiBiF3OCJg"
          }
        }
      },
      {
        "shard" : 1,
        "reason" : "[node_two][10.70.13.9:9300][internal:indices/flush/synced/sync]",
        "routing" : {
          "state" : "STARTED",
          "primary" : true,
          "node" : "uERyGgVUTMy0SKj0x2mq5g",
          "relocating_node" : null,
          "shard" : 1,
      "index" : "lots_of_data_index",
      "allocation_id" : {
        "id" : "KcMDyJhnQDKcK1bFNRSe2Q"
      }
    }
  },
[ truncated because reason on the other 18 shards is the same ]

A synced flush works on all shards of all other indices I've tried it on, except the ones were data is being written to the index where reason for failure is given as coherent self explanatory English like "pending operations" or "ongoing operations on primary".

Hmm yes that is deeply unhelpful isn't it? The quickest way I can see to get more information is with this trace logger:

PUT _cluster/settings
{"transient":{"logger.org.elasticsearch.indices.flush.SyncedFlushService":"TRACE"}}

Once you've got what you need, you can disable it like this:

PUT _cluster/settings
{"transient":{"logger.org.elasticsearch.indices.flush.SyncedFlushService":null}}

You're looking for a message containing the string error while performing synced flush which should include the full exception and a stack trace.

Having enabled that an explanation for the failure is now logged:

[2020-02-17T10:18:59,121][TRACE][o.e.i.f.SyncedFlushService] [master_one]    [lots_of_data_index][6] error while performing synced flush on [[lots_of_data_index][6], node[ui6ied_6Tx2BO1pmyQM7gw], [R], s[STARTED], a[id=dKRbRASQTD6m9piC_GHDFQ]], skipping
org.elasticsearch.transport.RemoteTransportException: [node_one][10.70.13.155:9300][internal:indices/flush/synced/sync]
Caused by: org.elasticsearch.common.io.stream.NotSerializableExceptionWrapper: unsupported_operation_exception: syncedFlush is not supported on a read-only engine

It's read-only because the index is frozen! Unfreeze the index and synced flush works. It somehow didn't occur to me I was trying to perform an operation that involved modifying the index on an index I'd previously put in to a read-only state. It makes sense that a synced flush doesn't work on a frozen index but it would be nice if an attempt to do so returned a useful reason like happens when trying to do it on a closed index:

{
  "_shards" : {
    "total" : 2,
    "successful" : 0,
    "failed" : 2
  },
  "a_closed_index" : {
    "total" : 2,
    "successful" : 0,
    "failed" : 2,
    "failures" : [
      {
        "shard" : 0,
        "reason" : "closed"
      }
    ]
  }
}

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.