Curator 7.0.1 fails because of closed indices

Elasticsearch version: 7.17.1 (yes, I know.)

Curator version: 7.0.1

I find myself in the position of having to replace every server in our
Elasticsearch cluster to replace them with ones running RHEL 9. As part
of this I've discovered there's newer versions of Curator than the 5.8.2
we've used for years. Given that and that I can't find an rpm of 5.8.2 for RHEL 9
I've got Curator 7.0.1 installed to try. Sadly it does not work, at all,
every action fails.

This is the config

---
client:
  hosts:
    - foo.bar.redact
  port: 9200
  url_prefix:
  use_ssl: True
  certificate: /etc/pki/tls/certs/ca-bundle.crt
  client_cert:
  client_key:
  ssl_no_validate: False
  username: "curator"
  password: "redact"
  timeout: 30
    #  master_only: True
  master_only: False

logging:
#  loglevel: INFO
  loglevel: DEBUG
  logformat: default
    #  blacklist: ['elasticsearch', 'urllib3']
  blacklist: []

(Commented out lines what is normally used and I've changed to be able
to do test runs.)

We have far too many actions to post them all, but here's one:

---
actions:

  10000-linux-syslog-allocation:
    action: allocation
    description: "Apply shard allocation filtering rules to
linux-syslog indexes"
    options:
      ignore_empty_list: True
      timeout_override: 300
      allocation_type: include
      key: datatype
      value: cold
    filters:
    - filtertype: closed
      exclude: True
    - filtertype: pattern
      kind: prefix
      value: linux-syslog-
    - filtertype: age
      source: field_stats
      field: '@timestamp'
      direction: older
      stats_result: min_value
      unit: days
      unit_count: 1

The debug log just for that action is 65MB, so here's just the very last bit where it fails

2025-01-13 14:43:44,519 WARNING            elasticsearch
log_request_fail:288  GET
https://foo.bar.redact:9200/.kibana-web-development_7,.tasks,linux-syslog-2024-14-avoidclosed-c2024-29c,linux-syslog-2025-02,idm-radius-auth-detail-2024.11.01,radius-auth-detail-2024.12.13,radius-log-2024.10.27,radius-log-2024.12.07,radius-log-2024.12.09,radius-log-2025.01.04,radius-log-2025.01.11,httpd-access-2023-14,httpd-access-2024-15,httpd-access-2024-52,linux-syslog-2024.11.03,linux-syslog-2024.12.12,linux-syslog8-2024.10.27,security-dhcp-2024.08.07,security-dhcp-2024.12.03,dns-queries-external-2024.11.01,dns-queries-external-2024.11.06,dns-queries-external-2024.11.11,dns-queries-external-2024.12.29,dns-queries-internal-2024.10.18,dns-queries-internal-2024.10.26,dns-queries-internal-2024.11.09,dns-queries-internal-2024.11.19,dns-rpz-2024.12,firepower-2024.07.17-avoidclosed-c2024.10.22c,firepower-2024.07.19-avoidclosed-c2024.11.06c,firepower-2024.08.05,firepower-2024.09.15-avoidclosed-c2024.12.18c,firepower-2024.10.28,firepower-2024.10.29,firepower-2024.11.14,firepower-2024.11.29,firepower-2025.01.04,firepower-2025.01.09,security-2024.08.31,security-2024.10.18,security-2024.11.17,security-2024.11.19,security-2024.11.28,security-2024.11.30,security-netflow-2024.12.22,security-vpn-2025.01.04,security-vpn-2025.01.12,linux-desktop-syslog-2024-07-avoidclosed-c2024-41c,linux-desktop-syslog-2024-23-avoidclosed-c2024-41c,server-eventlog-security-2024.10.14,server-eventlog-security-2024.12.08,server-eventlog-security-2024.12.25,server-firewall-prod-2024.07.28,server-firewall-prod-2024.10.26,server-firewall-prod-2024.11.09,server-firewall-prod-2024.11.15,server-horizon-2024.12,server-iis-2024.11.24,server-iis-2024.12.12,server-iis-2024.12.14,server-iis-2024.12.24,server-sql-2024.10.17,server-sql-2024.10.24,server-sql-2024.12.31,eventlog-2024.08.25-avoidclosed-c2024.11.28c,eventlog-2024.08.26-avoidclosed-c2024.11.28c,eventlog-2024.09.19-avoidclosed-c2024.12.19c,eventlog-2024.11.22,eventlog-2024.11.28,eventlog-2024.12.04,eventlog-2025.01.06,eventlog-2025.01.07,freenas-2024.11.16,freenas-2024.12.21,freenas-2024.12.24,access-2024.09.12,web-development-access-2024.12.02,access-2025.01.02,application-2024.10.28,application-2024.12.31,audit-2022-27,audit-2024-05,audit-2024-42,audit-2024-46/_stats/store,docs
[status:400 request:0.129s]
2025-01-13 14:43:44,519 DEBUG              elasticsearch
log_request_fail:308  > None
2025-01-13 14:43:44,519 DEBUG              elasticsearch
log_request_fail:313  <
{"error":{"root_cause":[{"type":"index_closed_exception","reason":"closed","index_uuid":"cTItOPS3SBiZs0jm5-U8xA","index":"httpd-access-2023-14"}],"type":"index_closed_exception","reason":"closed","index_uuid":"cTItOPS3SBiZs0jm5-U8xA","index":"httpd-access-2023-14"},"status":400}
2025-01-13 14:43:44,519 ERROR                curator.cli
   run:211  Failed to complete action: allocation.  <class 'KeyError'>:
'indices'

This is another action:

actions:

  10000-linux-syslog-close:
    action: close
    description: "Close linux-syslog indexes"
    options:
      delete_aliases: True
      ignore_empty_list: True
      timeout_override: 300
    filters:
    - filtertype: closed
      exclude: True
    - filtertype: pattern
      kind: prefix
      value: linux-syslog-
    - filtertype: pattern
      kind: regex
      value: '.*-avoidclosed-.*'
      exclude: True
    - filtertype: age
      source: field_stats
      field: '@timestamp'
      direction: older
      stats_result: max_value
      unit: days
      unit_count: 91

and that also fails when Curator tries to get information about a closed index.

The queries in the log when an action fails and the index that it complains about being closed are not the same on every run. I've checked and the indices it complains are closed are in fact closed. It seems like Curator is not filtering out closed indices as it should by default and has been explicitly told to do. We have lots of closed indices. Not having closed indices is not a viable solution to this problem.

I did a DEBUG run of the first action mentioned above with Curator 5.8.4 and from the output I can see that it never attempts to make a GET request for _stats/store,docs for the index httpd-access-2023-14, as 7.0.1 does.

I'm confused/surprised by how Curator is gathering lots of information about indices which do not match the prefix specified in the action, but even if it did only look at the indices which match the prefix value it would still encounter closed ones. And a DEBUG run with Curator 5.8.4 shows it making the same GET request for _stats/store,docs about indices which aren't relevant to the action.

Can anyone explain what's going on here and how to make Curator 7.0.1 not fail because of closed indices?

(I am aware of Index Lifecycle Management, it seems like it would be
a huge effort to move over to it and I would very much prefer not to have
deal with that at present.)

Sorry for the delayed response. I was out of office.

I am indeed puzzled by this behavior since the get_indices function clearly includes expand_wildcards=open,closed.

It might be useful to attempt to use Curator 8 with this, just to see if it behaves differently. There are a few things that will need to be done for it to work in your 7.x environment.

  1. The client configuration file format is slightly different. 7.x vs 8.x. Note that the client key is now a sub-key of elasticsearch, and that some of the settings are under another sub-key called other_settings, namely username and password.
  2. There is another setting that you will need that goes underneath other_settings called skip_version_test. This is a boolean value that defaults to false. If you set it to true, it will allow Curator 8 to work against your 7.17 cluster.

Sorry for the delayed response. I was out of office.

No worries, it's not like there's a Service Level Agreement. :smiley:

I believe I've found the cause of the problem and a fix that is adequate
for our purposes.

Whilst digging around some more in the huge DEBUG logs I noticed upon
this error that I hadn't before:

2025-01-22 14:54:18,286 DEBUG              elasticsearch
log_request_fail:313  <
{"error":{"root_cause":[{"type":"security_exception","reason":"Unexpected
exception
cluster:monitor/state"}],"type":"security_exception","reason":"Unexpected
exception cluster:monitor/state"},"status":500}

and then discovered that was present in the logs of past debug runs,
always before the index_closed_exception error. Looking in
Elasticsearch log around the same time I found this

[master-down-goodbye] Unexpected exception
java.lang.IllegalArgumentException: Indices [.geoip_databases] use and
access is reserved for system operations
java.lang.IllegalArgumentException: Indices [.geoip_databases] use and
access is reserved for system operations
[ lots of typical Java error vomit follows ]

(I am well aware SearchGuard is nothing to do with Elastic.)

As I understand it, what was happening was that Curator does a lot of
queries to get data about indices which look like this

GET
https://foo.bar.redact:9200/_cluster/state/metadata/anindex,anotherindex,yetanotherindex,.kibana_something,and,so,on,and,so,on

Eventually one of those queries would be done for a list of indices
which includes .geoip_databases, e.g.

2025-01-13 12:40:14,044 WARNING            elasticsearch
log_request_fail:288  GET
https://foo.bar.redact:9200/_cluster/state/metadata/.geoip_databases,.reporting-web-2024-12-15,linux-syslog-2024-11,radius-auth-2024.08.16,a,log,of,other,index,names
[status:500 request:0.017s]
2025-01-13 12:40:14,044 DEBUG              elasticsearch
log_request_fail:308  > None
2025-01-13 12:40:14,044 DEBUG              elasticsearch
log_request_fail:313  <
{"error":{"root_cause":[{"type":"security_exception","reason":"Unexpected
exception
cluster:monitor/state"}],"type":"security_exception","reason":"Unexpected
exception cluster:monitor/state"},"status":500}

The index which Curator later complained was closed would be one
that was in the earlier query that included .geoip_databases. So it
appears that having failed to get information about some indices Curator
carries on, and if any of those indices are closed Curator doesn't
filter them out because it does not know they are closed, then it tries
to do a /_stats/store,docs/ query which includes one of those indices it doesn't know are closed, and that fails because the index is closed, so Curator chucks an error and stops.

Having figured out that .geoip_databases index is where Elasticsearch
puts updates it download for the GeoIP databases used by
(GeoIP processor | Elasticsearch Guide [7.17] | Elastic)
I know we don't need it because we do not use ingest pipelines. (Our
architecture was designed with Elasticsearch 2 when the concept of an
ingest node didn't exist. We use Logstash to munge and/or enrich data
before it reaches Elasticsearch.) So I did

$ curl -XPUT -u bob "https://$(hostname):9200/_cluster/settings?pretty"
-H 'Content-Type: application/json' -d ' {  "persistent" : {
"ingest.geoip.downloader.enabled" : false  } }'

which caused Elasticsearch to delete the .geoip_databases index. After this the problem of Curator failing because of closed indices as described in my original post has gone away.

It is not evident to me why this the existence of the
.geoip_databases index has not been a problem for Curator 5.8.4.

I wonder if the /_cluster/state/metadata/.geoip_databases query would work when done with a user that has some more permissions than the curator user, hence if the problem could be solved by giving the curator user some more permissions. Unfortunately I didn't think about that until after I'd deleted the index. Setting ingest.geoip.downloader.enabled to true again didn't result in the index getting recreated after several days, so I've given up on getting it back. (And I will very soon replace the remaining nodes that have Internet access with new ones that do not.) As such so I haven't explored solving the issue with permissions. I haven't tried Curator 8 as there doesn't seem any point when I can't recreate the thing that caused 7 to fail.

I think you'll find this reply I recently made could help address this:

The snippet doesn't get to the good part, which is the index_pattern option.

Do you mean search_pattern rather than index_pattern (which doesn't appear in you reply)?

The cluster I'm working with currently has 4893 indices and Curator take hours to do all the actions, so the idea of using search_pattern to limit the indices Curator gathers data for to only those relevant to a given action, and thus greatly reduce the time it takes to run all the actions and make much smaller DEBUG logs in the event we find ourselves making those, sounds great. It's a shame search_pattern doesn't exist in Curator 7. I might give Curator 8 a go with the modifications you mentioned before and search_pattern if I have the time though.