I'm using curator with Elasticsearch version 6.3.0 to do a shrink and delete after action on indices after they've been rolled over.
It appears to operate without errors, since the curator log does not show any errors, but if I look at the cluster logs, there is a warning log that it can't delete the index, and when I check the disk, the index folder is still there, so the space hasn't been cleared either.
[2018-11-21T12:09:28,605][WARN ][o.e.i.IndicesService ] [ELK-ES-A] [logstash-syslog-2018.09.07-001/gv8gx3KhSKWTG6c6OmpH5g] failed to delete index
java.io.IOException: could not remove the following files (in the order of attempts):
D:\ES\data\nodes\0\indices\gv8gx3KhSKWTG6c6OmpH5g\0\index\_1qvm.dim: java.nio.file.AccessDeniedException: D:\ES\data\nodes\0\indices\gv8gx3KhSKWTG6c6OmpH5g\0\index\_1qvm.dim
D:\ES\data\nodes\0\indices\gv8gx3KhSKWTG6c6OmpH5g\0\index\_1qvm.fdt: java.nio.file.AccessDeniedException: D:\ES\data\nodes\0\indices\gv8gx3KhSKWTG6c6OmpH5g\0\index\_1qvm.fdt
D:\ES\data\nodes\0\indices\gv8gx3KhSKWTG6c6OmpH5g\0\index\_1qvm_Lucene50_0.doc: java.nio.file.AccessDeniedException: D:\ES\data\nodes\0\indices\gv8gx3KhSKWTG6c6OmpH5g\0\index\_1qvm_Lucene50_0.doc
D:\ES\data\nodes\0\indices\gv8gx3KhSKWTG6c6OmpH5g\0\index\_1qvm_Lucene50_0.pos: java.nio.file.AccessDeniedException: D:\ES\data\nodes\0\indices\gv8gx3KhSKWTG6c6OmpH5g\0\index\_1qvm_Lucene50_0.pos
...
and then later on it just logs the following every 10 seconds for about 30 minutes:
[2018-11-21T12:09:28,727][WARN ][o.e.i.IndicesService ] [ELK-ES-A] [logstash-syslog-2018.09.07-001/gv8gx3KhSKWTG6c6OmpH5g] still pending deletes present for shards [[[logstash-syslog-2018.09.07-001/gv8gx3KhSKWTG6c6OmpH5g]], [[logstash-syslog-2018.09.07-001/gv8gx3KhSKWTG6c6OmpH5g]][0], [[logstash-syslog-2018.09.07-001/gv8gx3KhSKWTG6c6OmpH5g]][1]] - retrying
my curator action file looks like this:
1:
action: rollover
description: >-
Rollover logstash-syslog_write alias if it is bigger than 4gb or older than 28d
options:
name: logstash-syslog_write
disable_action: False
ignore_empty_list: True
conditions:
max_size: 4gb
max_age: 28d
2:
action: index_settings
description: >-
Set number of replicas to 0 on rolled over logstash-* indices to prepare for shrinking
options:
disable_action: False
ignore_empty_list: True
index_settings:
index:
number_of_replicas: 0
filters:
- filtertype: alias
aliases:
- logstash-fw-syslog_write
- logstash-syslog_write
- logstash-vpn-syslog_write
exclude: True
- filtertype: pattern
kind: prefix
value: logstash-
- filtertype: pattern
kind: suffix
value: -archive
exclude: True
3:
action: shrink
description: >-
Shrink rolled over logstash-* indices on ELK-ES-A.
Delete each source index after successful shrink,
then reroute the shrunk index with the provided parameters.
options:
disable_action: False
ignore_empty_list: True
shrink_node: ELK-ES-A
node_filters:
permit_masters: True
number_of_shards: 1
number_of_replicas: 0
shrink_prefix: ''
shrink_suffix: '-archive'
delete_after: True
post_allocation:
allocation_type: require
key: 'node_type'
value: 'cold'
wait_for_active_shards: all
extra_settings:
settings:
index.codec: best_compression
index.refresh_interval: 1m
wait_for_completion: True
wait_for_rebalance: True
wait_interval: 9
max_wait: -1
timeout_override: 21600
filters:
- filtertype: alias
aliases:
- logstash-fw-syslog_write
- logstash-syslog_write
- logstash-vpn-syslog_write
exclude: True
- filtertype: pattern
kind: prefix
value: logstash-
- filtertype: pattern
kind: suffix
value: -archive
exclude: True
Elasticsearch is running as a System service on Windows, so the Access Denied exception cannot be from security permissions. Plus, I don't get this error if I just delete an index, it is only after shrinking that I get these logs.
Any help is greatly appreciated. Or, if this is solved by upgrading elasticsearch, I can do that as well.