Elasticsearch version: 7.17.1 (yes, I know.)
Curator version: 7.0.1
I find myself in the position of having to replace every server in our
Elasticsearch cluster to replace them with ones running RHEL 9. As part
of this I've discovered there's newer versions of Curator than the 5.8.2
we've used for years. Given that and that I can't find an rpm of 5.8.2 for RHEL 9
I've got Curator 7.0.1 installed to try. Sadly it does not work, at all,
every action fails.
This is the config
---
client:
hosts:
- foo.bar.redact
port: 9200
url_prefix:
use_ssl: True
certificate: /etc/pki/tls/certs/ca-bundle.crt
client_cert:
client_key:
ssl_no_validate: False
username: "curator"
password: "redact"
timeout: 30
# master_only: True
master_only: False
logging:
# loglevel: INFO
loglevel: DEBUG
logformat: default
# blacklist: ['elasticsearch', 'urllib3']
blacklist: []
(Commented out lines what is normally used and I've changed to be able
to do test runs.)
We have far too many actions to post them all, but here's one:
---
actions:
10000-linux-syslog-allocation:
action: allocation
description: "Apply shard allocation filtering rules to
linux-syslog indexes"
options:
ignore_empty_list: True
timeout_override: 300
allocation_type: include
key: datatype
value: cold
filters:
- filtertype: closed
exclude: True
- filtertype: pattern
kind: prefix
value: linux-syslog-
- filtertype: age
source: field_stats
field: '@timestamp'
direction: older
stats_result: min_value
unit: days
unit_count: 1
The debug log just for that action is 65MB, so here's just the very last bit where it fails
2025-01-13 14:43:44,519 WARNING elasticsearch
log_request_fail:288 GET
https://foo.bar.redact:9200/.kibana-web-development_7,.tasks,linux-syslog-2024-14-avoidclosed-c2024-29c,linux-syslog-2025-02,idm-radius-auth-detail-2024.11.01,radius-auth-detail-2024.12.13,radius-log-2024.10.27,radius-log-2024.12.07,radius-log-2024.12.09,radius-log-2025.01.04,radius-log-2025.01.11,httpd-access-2023-14,httpd-access-2024-15,httpd-access-2024-52,linux-syslog-2024.11.03,linux-syslog-2024.12.12,linux-syslog8-2024.10.27,security-dhcp-2024.08.07,security-dhcp-2024.12.03,dns-queries-external-2024.11.01,dns-queries-external-2024.11.06,dns-queries-external-2024.11.11,dns-queries-external-2024.12.29,dns-queries-internal-2024.10.18,dns-queries-internal-2024.10.26,dns-queries-internal-2024.11.09,dns-queries-internal-2024.11.19,dns-rpz-2024.12,firepower-2024.07.17-avoidclosed-c2024.10.22c,firepower-2024.07.19-avoidclosed-c2024.11.06c,firepower-2024.08.05,firepower-2024.09.15-avoidclosed-c2024.12.18c,firepower-2024.10.28,firepower-2024.10.29,firepower-2024.11.14,firepower-2024.11.29,firepower-2025.01.04,firepower-2025.01.09,security-2024.08.31,security-2024.10.18,security-2024.11.17,security-2024.11.19,security-2024.11.28,security-2024.11.30,security-netflow-2024.12.22,security-vpn-2025.01.04,security-vpn-2025.01.12,linux-desktop-syslog-2024-07-avoidclosed-c2024-41c,linux-desktop-syslog-2024-23-avoidclosed-c2024-41c,server-eventlog-security-2024.10.14,server-eventlog-security-2024.12.08,server-eventlog-security-2024.12.25,server-firewall-prod-2024.07.28,server-firewall-prod-2024.10.26,server-firewall-prod-2024.11.09,server-firewall-prod-2024.11.15,server-horizon-2024.12,server-iis-2024.11.24,server-iis-2024.12.12,server-iis-2024.12.14,server-iis-2024.12.24,server-sql-2024.10.17,server-sql-2024.10.24,server-sql-2024.12.31,eventlog-2024.08.25-avoidclosed-c2024.11.28c,eventlog-2024.08.26-avoidclosed-c2024.11.28c,eventlog-2024.09.19-avoidclosed-c2024.12.19c,eventlog-2024.11.22,eventlog-2024.11.28,eventlog-2024.12.04,eventlog-2025.01.06,eventlog-2025.01.07,freenas-2024.11.16,freenas-2024.12.21,freenas-2024.12.24,access-2024.09.12,web-development-access-2024.12.02,access-2025.01.02,application-2024.10.28,application-2024.12.31,audit-2022-27,audit-2024-05,audit-2024-42,audit-2024-46/_stats/store,docs
[status:400 request:0.129s]
2025-01-13 14:43:44,519 DEBUG elasticsearch
log_request_fail:308 > None
2025-01-13 14:43:44,519 DEBUG elasticsearch
log_request_fail:313 <
{"error":{"root_cause":[{"type":"index_closed_exception","reason":"closed","index_uuid":"cTItOPS3SBiZs0jm5-U8xA","index":"httpd-access-2023-14"}],"type":"index_closed_exception","reason":"closed","index_uuid":"cTItOPS3SBiZs0jm5-U8xA","index":"httpd-access-2023-14"},"status":400}
2025-01-13 14:43:44,519 ERROR curator.cli
run:211 Failed to complete action: allocation. <class 'KeyError'>:
'indices'
This is another action:
actions:
10000-linux-syslog-close:
action: close
description: "Close linux-syslog indexes"
options:
delete_aliases: True
ignore_empty_list: True
timeout_override: 300
filters:
- filtertype: closed
exclude: True
- filtertype: pattern
kind: prefix
value: linux-syslog-
- filtertype: pattern
kind: regex
value: '.*-avoidclosed-.*'
exclude: True
- filtertype: age
source: field_stats
field: '@timestamp'
direction: older
stats_result: max_value
unit: days
unit_count: 91
and that also fails when Curator tries to get information about a closed index.
The queries in the log when an action fails and the index that it complains about being closed are not the same on every run. I've checked and the indices it complains are closed are in fact closed. It seems like Curator is not filtering out closed indices as it should by default and has been explicitly told to do. We have lots of closed indices. Not having closed indices is not a viable solution to this problem.
I did a DEBUG run of the first action mentioned above with Curator 5.8.4 and from the output I can see that it never attempts to make a GET request for _stats/store,docs
for the index httpd-access-2023-14
, as 7.0.1 does.
I'm confused/surprised by how Curator is gathering lots of information about indices which do not match the prefix
specified in the action, but even if it did only look at the indices which match the prefix
value it would still encounter closed ones. And a DEBUG run with Curator 5.8.4 shows it making the same GET request for _stats/store,docs
about indices which aren't relevant to the action.
Can anyone explain what's going on here and how to make Curator 7.0.1 not fail because of closed indices?
(I am aware of Index Lifecycle Management, it seems like it would be
a huge effort to move over to it and I would very much prefer not to have
deal with that at present.)