Index become readonly for unknown reason

I am using ILM via Logstash to manage my indices
I have a new index that was created in 31/10 and for some unknown reason, it has become read-only, I noticed it has block.write=true on it.

My ILM policy shouldn't do this as no condition was met - The policy:

{
    "policy": {
        "phases": {
            "hot": {
                "min_age": "0ms",
                "actions": {
                    "rollover": {
                        "max_size": "100gb"
                    },
                    "set_priority": {
                        "priority": 100
                    }
                }
            },
            "warm": {
                "min_age": "0ms",
                "actions": {
                    "allocate": {
                        "number_of_replicas": 1,
                        "include": {},
                        "exclude": {},
                        "require": {
                            "box_type": "warm"
                        }
                    },
                    "forcemerge": {
                        "max_num_segments": 1
                    },
                    "set_priority": {
                        "priority": 50
                    },
                    "shrink": {
                        "number_of_shards": 1
                    }
                }
            },
            "cold": {
                "min_age": "30d",
                "actions": {
                    "freeze": {},
                    "set_priority": {
                        "priority": 0
                    }
                }
            },
            "delete": {
                "min_age": "365d",
                "actions": {
                    "delete": {}
                }
            }
        }
    }
}

I have DEBUG logs from the Master at the time it became a read-only:

[2019-11-04T12:30:04,705][DEBUG][o.e.i.IndicesService ] [Master] creating Index [[12months-retention-2019.10.31-000001/IQvq2eTrSqyX-Gd476zvLA]], shards [5]/[1] - reason [metadata verification]
[2019-11-04T12:30:04,706][DEBUG][o.e.i.m.MapperService ] [Master] [12months-retention-2019.10.31-000001] using dynamic[true]
[2019-11-04T12:30:04,707][DEBUG][o.e.i.c.b.BitsetFilterCache] [Master] [12months-retention-2019.10.31-000001] clearing all bitsets because [close]
[2019-11-04T12:30:04,707][DEBUG][o.e.i.c.q.IndexQueryCache] [Master] [12months-retention-2019.10.31-000001] full cache clear, reason [close]
[2019-11-04T12:30:04,707][DEBUG][o.e.i.c.b.BitsetFilterCache] [Master] [12months-retention-2019.10.31-000001] clearing all bitsets because [close]
[2019-11-04T12:31:48,285][DEBUG][o.e.x.i.IndexLifecycleRunner] [Master] [12months-retention-2019.10.31-000001] running periodic policy with current-step [{"phase":"hot","action":"rollover","name":"check-rollover-ready"}]

This is the index template:

{
  "order": 0,
  "version": 60001,
  "index_patterns": [
    "12months-retention-*"
  ],
  "settings": {
    "index": {
      "lifecycle": {
        "name": "12months_policy",
        "rollover_alias": "12months-retention"
      }
    }
  },
  "mappings": {},
  "aliases": {}
}

This is the current index status:

No new index was created, it did not rollover.
The Elasticsearch version is 6.8.3

Please advise, thanks.

The read-only block is only applied prior to the forcemerge or shrink actions. It appears though that this index is still in the rollover stage.

Can you provide the output of:

GET /<index-name>/_ilm/explain?human

So I can see where the index currently is?

Hi,

{
  "indices" : {
    "12months-retention-2019.10.31-000001" : {
      "index" : "12months-retention-2019.10.31-000001",
      "managed" : true,
      "policy" : "12months_policy",
      "lifecycle_date" : "2019-10-31T10:45:01.688Z",
      "lifecycle_date_millis" : 1572518701688,
      "phase" : "hot",
      "phase_time" : "2019-11-10T08:46:32.139Z",
      "phase_time_millis" : 1573375592139,
      "action" : "rollover",
      "action_time" : "2019-10-31T10:50:15.203Z",
      "action_time_millis" : 1572519015203,
      "step" : "check-rollover-ready",
      "step_time" : "2019-11-10T08:46:32.139Z",
      "step_time_millis" : 1573375592139,
      "phase_execution" : {
        "policy" : "12months_policy",
        "phase_definition" : {
          "min_age" : "0ms",
          "actions" : {
            "rollover" : {
              "max_size" : "100gb"
            },
            "set_priority" : {
              "priority" : 100
            }
          }
        },
        "version" : 2,
        "modified_date" : "2019-10-07T13:55:05.451Z",
        "modified_date_in_millis" : 1570456505451
      }
    }
  }
}

Thanks,

Okay, judging from that the index should not have been marked as readonly. Do you know if any other process could have done this? It appears that ILM is not in a position to do it.

Also, does this problem reproduce on any of your other indices? And can you confirm this is the write block and not the write-but-allow-deletes block?

Hi,

I removed that index and started over another index with a completely different policy and alias:

The index became read-only again:

This is a different cluster from the one in the initial post
I have the exact same configuration on other clusters, all with the same version, but the issue seems to not occur there.

The ILM policy:

 {
    "policy": {
        "phases": {
            "hot": {
                "min_age": "0ms",
                "actions": {
                    "rollover": {
                        "max_size": "100gb"
                    },
                    "set_priority": {
                        "priority": 100
                    }
                }
            },
            "warm": {
                "min_age": "0ms",
                "actions": {
                    "allocate": {
                        "number_of_replicas": 1,
                        "include": {},
                        "exclude": {},
                        "require": {
                            "box_type": "warm"
                        }
                    },
                    "forcemerge": {
                        "max_num_segments": 1
                    },
                    "set_priority": {
                        "priority": 50
                    },
                    "shrink": {
                        "number_of_shards": 1
                    }
                }
            },
            "cold": {
                "min_age": "30d",
                "actions": {
                    "freeze": {},
                    "set_priority": {
                        "priority": 0
                    }
                }
            },
            "delete": {
                "min_age": "365d",
                "actions": {
                    "delete": {}
                }
            }
        }
    }
}

BTW the ILM is controlled by Logstash using the following configuration:

    elasticsearch {
        hosts => ['xxx1', 'xxx2']
        sniffing => true
        ilm_enabled => true
        ilm_policy => "ltr12_policy"
        ilm_rollover_alias => "ltr12-services"
        manage_template => true
        template_name => "ltr12-services"
        template => "/etc/logstash/index_templates/ltr12-services.json"
        template_overwrite => true
    }

More findings

It seems like the index become read-only every day at the same time (12:30 EDT)
Looking at the Master (debug) logs at that time shows these:

[2019-11-15T11:45:12,649][DEBUG][o.e.x.i.IndexLifecycleRunner] [Master] [ltr12-services-2019.11.10-000001] running periodic policy with current-step [{"phase":"hot","action":"rollover","name":"check-rollover-ready"}]
[2019-11-15T11:55:12,652][DEBUG][o.e.x.i.IndexLifecycleRunner] [Master] [ltr12-services-2019.11.10-000001] running periodic policy with current-step [{"phase":"hot","action":"rollover","name":"check-rollover-ready"}]
[2019-11-15T12:05:12,653][DEBUG][o.e.x.i.IndexLifecycleRunner] [Master] [ltr12-services-2019.11.10-000001] running periodic policy with current-step [{"phase":"hot","action":"rollover","name":"check-rollover-ready"}]
[2019-11-15T12:15:12,659][DEBUG][o.e.x.i.IndexLifecycleRunner] [Master] [ltr12-services-2019.11.10-000001] running periodic policy with current-step [{"phase":"hot","action":"rollover","name":"check-rollover-ready"}]
[2019-11-15T12:25:12,653][DEBUG][o.e.x.i.IndexLifecycleRunner] [Master] [ltr12-services-2019.11.10-000001] running periodic policy with current-step [{"phase":"hot","action":"rollover","name":"check-rollover-ready"}]
[2019-11-15T12:30:04,357][DEBUG][o.e.i.IndicesService ] [Master] creating Index [[ltr12-services-2019.11.10-000001/gipLHlPzSeedZBWyiQRQNw]], shards [2]/[1] - reason [metadata verification]
> [2019-11-15T12:30:04,358][DEBUG][o.e.i.m.MapperService ] [Master] [ltr12-services-2019.11.10-000001] using dynamic[true]
> [2019-11-15T12:30:04,359][DEBUG][o.e.i.c.b.BitsetFilterCache] [Master] [ltr12-services-2019.11.10-000001] clearing all bitsets because [close]
> [2019-11-15T12:30:04,359][DEBUG][o.e.i.c.q.IndexQueryCache] [Master] [ltr12-services-2019.11.10-000001] full cache clear, reason [close]
> [2019-11-15T12:30:04,359][DEBUG][o.e.i.c.b.BitsetFilterCache] [Master] [ltr12-services-2019.11.10-000001] clearing all bitsets because [close]
[2019-11-15T12:35:12,659][DEBUG][o.e.x.i.IndexLifecycleRunner] [Master] [ltr12-services-2019.11.10-000001] running periodic policy with current-step [{"phase":"hot","action":"rollover","name":"check-rollover-ready"}]
[2019-11-15T12:45:12,660][DEBUG][o.e.x.i.IndexLifecycleRunner] [Master] [ltr12-services-2019.11.10-000001] running periodic policy with current-step [{"phase":"hot","action":"rollover","name":"check-rollover-ready"}]
[2019-11-15T12:55:12,654][DEBUG][o.e.x.i.IndexLifecycleRunner] [Master] [ltr12-services-2019.11.10-000001] running periodic policy with current-step [{"phase":"hot","action":"rollover","name":"check-rollover-ready"}]
[2019-11-15T13:05:12,652][DEBUG][o.e.x.i.IndexLifecycleRunner] [Master] [ltr12-services-2019.11.10-000001] running periodic policy with current-step [{"phase":"hot","action":"rollover","name":"check-rollover-ready"}]

Can you help explain this behavior?

Finally found the root cause
It was a periodic Curator job which runs index.block.write & forceMerge job on any index older than X days.
Curator should skip ILM managed indices by default, this is why i didn't check that before
But it founds out it's only have that supported since version 5.7

I solved this by upgrading the Curator to 5.7