Curator delete_indices fail


(Rémi Desgrange) #1

Hi,

I'm aware of this topic:

I'm using ES 6.2.4 ans curator 5.5.4

My config.yml is:

client:
  hosts:
    - es1.corp.net
    - es2.corp.net
    - es3.corp.net
  port: 9200
  timeout: 30
logging:
  loglevel: DEBUG 

actions.yml

actions:
  1:
    action: delete_indices
    description: >-
      Delete indices older than 30 days, metricbeat
    options:
      ignore_empty_list: True
      disable_action: False
    filters:
    - filtertype: pattern
      kind: prefix
      value: metricbeat-
    - filtertype: age
      source: name
      direction: older
      timestring: '%Y.%m.%d'
      unit: days
      unit_count: 30

This curator config is running every day in a fesh docker container. It's not working since y upgrade from 6.2.3 to 6.2.4.

client = elasticsearch.Elasticsearch(hosts='es1.corp.net')

client.indices.stats(index='metricbeat-6.2.3-2018.05.24', metric='store,docs')
{'_shards': {'total': 2, 'successful': 2, 'failed': 0},
 '_all': {'primaries': {'docs': {'count': 766735, 'deleted': 0},
   'store': {'size_in_bytes': 281425602}},
  'total': {'docs': {'count': 1533470, 'deleted': 0},
   'store': {'size_in_bytes': 563361892}}},
 'indices': {'metricbeat-6.2.3-2018.05.24': {'primaries': {'docs': {'count': 766735,
     'deleted': 0},
    'store': {'size_in_bytes': 281425602}},
   'total': {'docs': {'count': 1533470, 'deleted': 0},
    'store': {'size_in_bytes': 563361892}}}}}

and (partial) debug log:

2018-06-11 12:33:58,867 DEBUG              curator.utils             get_client:803  kwargs = {'url_prefix': '', 'aws_secret_key': None, 'http_auth': None, 'certificate': None, 'aws_key': None, 'aws_sign_request': False, 'port': 9200, 'hosts': ['es1.corp.net', 'es2.corp.net', 'es3.corp.net'], 'timeout': 30, 'aws_token': None, 'use_ssl': False, 'master_only': False, 'client_cert': None, 'ssl_no_validate': False, 'client_key': None}
2018-06-11 12:33:58,870 DEBUG              curator.utils             get_client:878  "requests_aws4auth" module present, but not used.
2018-06-11 12:33:58,879 DEBUG              curator.utils          check_version:689  Detected Elasticsearch version 6.2.4
2018-06-11 12:33:58,879 DEBUG                curator.cli                    run:159  client is <class 'elasticsearch.client.Elasticsearch'>
2018-06-11 12:33:58,879 INFO                 curator.cli                    run:165  Trying Action ID: 1, "delete_indices": Delete indices older than 30 days, metricbeat
2018-06-11 12:33:58,879 DEBUG                curator.cli         process_action:44   Configuration dictionary: {'action': 'delete_indices', 'description': 'Delete indices older than 30 days, metricbeat', 'filters': [{'exclude': False, 'kind': 'prefix', 'filtertype': 'pattern', 'value': 'metricbeat-'}, {'direction': 'older', 'stats_result': 'min_value', 'filtertype': 'age', 'source': 'name', 'epoch': None, 'timestring': '%Y.%m.%d', 'exclude': False, 'unit_count': 30, 'unit': 'days'}], 'options': {}}
2018-06-11 12:33:58,880 DEBUG                curator.cli         process_action:45   kwargs: {'master_timeout': 30, 'dry_run': False}
2018-06-11 12:33:58,880 DEBUG                curator.cli         process_action:50   opts: {}
2018-06-11 12:33:58,880 DEBUG                curator.cli         process_action:62   Action kwargs: {'master_timeout': 30}
2018-06-11 12:33:58,880 DEBUG                curator.cli         process_action:91   Running "DELETE_INDICES"
2018-06-11 12:33:58,881 DEBUG          curator.indexlist          __get_indices:66   Getting all indices
2018-06-11 12:33:58,962 DEBUG              curator.utils            get_indices:644  Detected Elasticsearch version 6.2.4

2018-06-11 12:33:59,017 DEBUG          curator.indexlist     __build_index_info:81   Building preliminary index metadata for .monitoring-kibana-6-2018.01.21
2018-06-11 12:33:59,017 DEBUG          curator.indexlist          _get_metadata:175  Getting index metadata
2018-06-11 12:33:59,017 DEBUG          curator.indexlist       empty_list_check:224  Checking for empty list
2018-06-11 12:34:04,021 DEBUG          curator.indexlist       _get_index_stats:115  Getting index stats
2018-06-11 12:34:04,021 DEBUG          curator.indexlist       empty_list_check:224  Checking for empty list
2018-06-11 12:34:04,021 DEBUG          curator.indexlist           working_list:235  Generating working list of indices
2018-06-11 12:34:04,022 DEBUG          curator.indexlist           working_list:235  Generating working list of indices
2018-06-11 12:34:04,028 ERROR                curator.cli                    run:184  Failed to complete action: delete_indices.  <type 'exceptions.KeyError'>: 'indices'

Do you have an explanation ? In advance Thanks


(Aaron Mildenstein) #2

How was curator installed in the docker container?

We'll see a lot more if you add an empty blacklist to your logging section, as follows:

logging:
  loglevel: DEBUG
  blacklist: []

The default behavior does not show the elasticsearch and urllib3 log traffic.


(Rémi Desgrange) #3

I use this container in prod : https://hub.docker.com/r/bobrik/curator/

but for test I create a clean virtualenv on my machine an pip install it:

virtualenv -p python2 .venv
source .venv/bin/activate
pip install elasticsearch-curator
.venv/bin/curator --version                                                               
curator, version 5.5.4

Complete log is 306 line long, I cannot paste it here what are the relevant part ?

2018-06-11 17:02:41,979 DEBUG              curator.utils          check_version:689  Detected Elasticsearch version 6.2.4
2018-06-11 17:02:41,979 DEBUG                curator.cli                    run:161  client is <class 'elasticsearch.client.Elasticsearch'>
2018-06-11 17:02:41,979 INFO                 curator.cli                    run:167  Trying Action ID: 1, "delete_indices": Delete indices older than 30 days, metricbeat
2018-06-11 17:02:41,979 DEBUG                curator.cli         process_action:44   Configuration dictionary: {'action': 'delete_indices', 'description': 'Delete indices older than 30 days, metricbeat', 'filters': [{'exclude': False, 'kind': 'prefix', 'filtertype': 'pattern', 'value': 'metricbeat-'}, {'direction': 'older', 'stats_result': 'min_value', 'filtertype': 'age', 'source': 'name', 'epoch': None, 'timestring': '%Y.%m.%d', 'exclude': False, 'unit_count': 30, 'unit': 'days'}], 'options': {}}
2018-06-11 17:02:41,979 DEBUG                curator.cli         process_action:45   kwargs: {'master_timeout': 30, 'dry_run': False}
0, 'aws_token': None, 'use_ssl': False, 'master_only': False, 'client_cert': None, 'ssl_no_validate': False, 'client_key': None}
2018-06-11 17:02:41,972 DEBUG              curator.utils             get_client:878  "requests_aws4auth" module present, but not used.
2018-06-11 17:02:41,973 DEBUG         urllib3.util.retry               from_int:200  Converted retries value: False -> Retry(total=False, connect=None, read
=None, redirect=0, status=None)
2018-06-11 17:02:41,973 DEBUG     urllib3.connectionpool              _new_conn:208  Starting new HTTP connection (1): es1.corp.net
2018-06-11 17:02:41,978 DEBUG     urllib3.connectionpool          _make_request:396  http://es1.corp.net:9200 "GET / HTTP/1.1" 200 435
2018-06-11 17:02:41,979 INFO               elasticsearch    log_request_success:83   GET http://es1.corp.net:9200/ [status:200 request:0.006s]
2018-06-11 17:02:41,979 DEBUG              elasticsearch    log_request_success:85   > None
2018-06-11 17:02:41,979 DEBUG              elasticsearch    log_request_success:86   < {
  "name" : "es1",
  "cluster_name" : "corp-es-cluster",
  "cluster_uuid" : "O0IQki3oQ5KsxBhoYvoYNQ",
  "version" : {
    "number" : "6.2.4",
    "build_hash" : "ccec39f",
    "build_date" : "2018-04-12T20:37:28.497551Z",
    "build_snapshot" : false,
    "lucene_version" : "7.2.1",
    "minimum_wire_compatibility_version" : "5.6.0",
    "minimum_index_compatibility_version" : "5.0.0"
  },
  "tagline" : "You Know, for Search"
}

2018-06-11 17:02:41,980 DEBUG         urllib3.util.retry               from_int:200  Converted retries value: False -> Retry(total=False, connect=None, read=None, redirect=0, status=None)
2018-06-11 17:02:41,980 DEBUG     urllib3.connectionpool              _new_conn:208  Starting new HTTP connection (1): es3.corp.net
2018-06-11 17:02:42,000 DEBUG     urllib3.connectionpool          _make_request:396  http://es3.corp.net:9200 "GET /_all/_settings?expand_wildcards=open%2Cclosed HTTP/1.1" 200 64398
2018-06-11 17:02:42,003 INFO               elasticsearch    log_request_success:83   GET http://es3.corp.net:9200/_all/_settings?expand_wildcards=open%2Cclosed [status:200 request:0.023s]
2018-06-11 17:02:42,003 DEBUG              elasticsearch    log_request_success:85   > None
2018-06-11 17:02:42,003 DEBUG              elasticsearch    log_request_success:86   < {"metricbeat-6.0.1-2018.05.11":{" BLABLA


2018-06-11 17:02:46,616 WARNING            elasticsearch       log_request_fail:97   GET http://es2.fibrea.net:9200/.monitoring-es-6-2018.01.23,.monitoring-es-6-2018.01.25,.monitoring-es-6-2018.02.01, ETC...
[status:404  request:0.005s]
2018-06-11 17:02:46,616 DEBUG              elasticsearch       log_request_fail:105  > None
2018-06-11 17:02:46,617 DEBUG              elasticsearch       log_request_fail:110  < {"error":{"root_cause":[{"type":"index_not_found_exception","reason":"no such index","resource.type":"index_or_alias","resource.id":"metricbeat-6.0.1-2018.05.10","index_uuid":"_na_","index":"metricbeat-6.0.1-2018.05.10"}],   "type":"index_not_found_exception","reason":"no such index","resource.type":"index_or_alias","resource.id":"metricbeat-6.0.1-2018.05.10","index_uuid":      "_na_","index":"metricbeat-6.0.1-2018.05.10"},"status":404}
2018-06-11 17:02:46,617 ERROR                curator.cli                    run:186  Failed to complete action: delete_indices.  <type 'exceptions.         KeyError'>: 'indices'

For the 404 error I check _cluster/health, I have no unassigned shard :frowning:


(Aaron Mildenstein) #4

This indicates that there's something amiss in your cluster. Something says that metricbeat-6.0.1-2018.05.10 exists to Curator (an API call), but then when it issues another API call to delete it, Elasticsearch is responding with {"type":"index_not_found_exception","reason":"no such index". It's not there. Are all three hosts in your config.yml part of the same cluster?

  hosts:
    - es1.corp.net
    - es2.corp.net
    - es3.corp.net

If these are not all members of the same cluster, and that index is not on all members, that would result in the "not found" response. Curator round-robins the requests. The first request hits the first host, and then the delete hits the second. This is the most likely explanation of what is happening.


(Rémi Desgrange) #5

Yes all this hosts are part of the same cluster, and there is no unassigned shared that what bother me :thinking:

If I retry with just 1 host it work... So Maybe it's a problem with my ES cluster and not with curator.

Thanks for your help. My first conclusion was the same as yours, but I cannot believe there are index inconsistency in the cluster. Apparently it can :frowning:

EDIT: ok.... So ashame of this. One of the host got out of the cluster and I didn't notice... It didn't raise alarm in the monitoring system, so it was completely my fault... sorry.


(andy_zhou) #6

not find indces... with --runing test..


(system) #7

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.