Rollover of indices doesn't work any more

Hi community,

we're running an Elastic Cloud cluster with Elasticsearch and Kibana that has been giving us some trouble in the last few weeks.

After our cluster ran out of storage a few weeks ago, the cluster stopped accepting write operations. This had happened before and we had fixed it by setting "index.blocks.read_only_allow_delete" to null in /_all/_settings.

This time, however, our Logstash instance that is feeding messages to Elasticsearch continued receiving error messages, saying that indices can't be written to since no index is marked the write-to index. Trying to set is_write_index to true for the latest index didn't work, so we ended up deleting most of our indices and re-creating them.
All our indices follow the naming scheme <Environment>-<Application Name>-<Date>-<Number>, and the rollover alias is <Environment>-<Application Name>. For example:

PUT %3Cprod-maiconnect-%7Bnow%2Fd%7D-000001%3E
{
  "aliases" : {
    "prod-maiconnect" : {}
  },
  "mappings" : {
    "properties" : {
      "@timestamp" : {
        "type" : "date"
      },
      "@version" : {
        "type" : "keyword"
      },
      "contextMap" : {
        "dynamic" : "false",
        "properties" : {
          "request_id" : {
            "type" : "text",
            "fields" : {
              "keyword" : {
                "type" : "keyword",
                "ignore_above" : 256
              }
            },
            "norms" : false
          },
          "user" : {
            "type" : "text",
            "fields" : {
              "keyword" : {
                "type" : "keyword",
                "ignore_above" : 256
              }
            },
            "norms" : false
          },
         // ... lots of more properties redacted
        }
      },
      "endOfBatch" : {
        "type" : "boolean"
      },
      "environment" : {
        "type" : "text",
        "fields" : {
          "keyword" : {
            "type" : "keyword",
            "ignore_above" : 256
          }
        }
      },
      "event" : {
        "properties" : {
          "original" : {
            "type" : "text",
            "fields" : {
              "keyword" : {
                "type" : "keyword",
                "ignore_above" : 256
              }
            }
          }
        }
      },
      "host" : {
        "properties" : {
          "ip" : {
            "type" : "text",
            "fields" : {
              "keyword" : {
                "type" : "keyword",
                "ignore_above" : 256
              }
            }
          }
        }
      },
      "instant" : {
        "properties" : {
          "epochSecond" : {
            "type" : "long"
          },
          "nanoOfSecond" : {
            "type" : "long"
          }
        }
      },
      "level" : {
        "type" : "text",
        "fields" : {
          "keyword" : {
            "type" : "keyword",
            "ignore_above" : 256
          }
        },
        "norms" : false
      },
      "loggerFqcn" : {
        "type" : "text",
        "fields" : {
          "keyword" : {
            "type" : "keyword",
            "ignore_above" : 256
          }
        }
      },
      "loggerName" : {
        "type" : "text",
        "fields" : {
          "keyword" : {
            "type" : "keyword",
            "ignore_above" : 256
          }
        },
        "norms" : false
      },
      "marker" : {
        "properties" : {
          "name" : {
            "type" : "text",
            "fields" : {
              "keyword" : {
                "type" : "keyword",
                "ignore_above" : 256
              }
            }
          }
        }
      },
      "marker_name" : {
        "type" : "text",
        "fields" : {
          "keyword" : {
            "type" : "keyword",
            "ignore_above" : 256
          }
        },
        "norms" : false
      },
      "message" : {
        "type" : "text",
        "fields" : {
          "keyword" : {
            "type" : "keyword",
            "ignore_above" : 256
          }
        },
        "norms" : false
      },
      "message_length" : {
        "type" : "long"
      },
      "thread" : {
        "type" : "text",
        "fields" : {
          "keyword" : {
            "type" : "keyword",
            "ignore_above" : 256
          }
        }
      },
      "threadId" : {
        "type" : "long"
      },
      "threadPriority" : {
        "type" : "long"
      },
      "thrown" : {
        "dynamic" : "false",
        "properties" : {
          "message" : {
            "type" : "text",
            "fields" : {
              "keyword" : {
                "type" : "keyword",
                "ignore_above" : 256
              }
            },
            "norms" : false
          },
          "name" : {
            "type" : "text",
            "fields" : {
              "keyword" : {
                "type" : "keyword",
                "ignore_above" : 256
              }
            },
            "norms" : false
          }
        }
      }
    }
  },
  "settings" : {
    "index" : {
      "refresh_interval" : "30s",
      "blocks" : {
        "read_only_allow_delete" : "false"
      },
      "priority" : "50",
      "number_of_replicas" : "0",
      "lifecycle" : {
        "name" : "prod",
        "rollover_alias" : "prod-maiconnect",
        "indexing_complete" : "true"
      },
      "highlight" : {
        "max_analyzed_offset" : "7000000"
      },
      "routing" : {
        "allocation" : {
          "include" : {
            "_tier_preference" : "data_warm,data_hot"
          }
        }
      },
      "number_of_shards" : "1"
    }
  }
}

Then, an index with the current date and given number is created and is being written to under the given alias. However, after some time we observe one of the following three erroneous behaviours:

  • Either, the index doesn't roll over at all; Elasticsearch keeps writing the messages to the index with the date and number of when it was created, and the index grows and grows until we get disk space problems again.

  • Or, when the index is supposed to roll over, it simply disappears, and as soon as the next incoming message is processed, a new index is created that is named like the rollover alias.
    This index then goes into error state with this message:

Index lifecycle error
illegal_argument_exception: index.lifecycle.rollover_alias [prod-maiconnect-*] does not point to index [prod-maiconnect]
  • Or, the index does roll over successfully, however the new index has is_write_index = false and we get this error:
illegal_argument_exception: index [prod-maiconnect-2023.06.11-000008] is not the write index for alias [prod-maiconnect]

We manually set the index to be the write index for this alias, but after the next rollover, it may show the same behaviour again, or one of the other 2 erroneous behaviours detailed above.

We tried to delete and re-create the indices multiple times, but we always end up with the same problem.

Does anyone have a hint what we could do?

Your Elastic Cloud support team can help: https://www.elastic.co/support/welcome/cloud#how-to-open-case

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.