Curator delete_indices. <type 'exceptions.KeyError'>: 'indices'

Hi, running curator job and once a while I get the following error...

Failed to complete action: delete_indices. <type 'exceptions.KeyError'>: 'indices'

Otherwise it works.

Here is the action...

actions:
  2:
    action: delete_indices
    description: Delete indices that are older than 1 week.
    options:
      timeout_override: 300
      ignore_empty_list: True
    filters:
    - filtertype: pattern
      kind: prefix
      value: foo-
    - filtertype: age
      source: name
      direction: older
      timestring: '%Y.%m.%d'
      unit: weeks
      unit_count: 1

What version of Curator?

What version of Elasticsearch?

What other log lines can you provide (DEBUG level will definitely be more informative).

Hi,

Curator 5.5
Elastic 6.2.3

Full logs:

2018-04-05 01:00:01,952 INFO      Preparing Action ID: 2, "delete_indices"
2018-04-05 01:00:01,967 INFO      Trying Action ID: 2, "delete_indices": Delete DC/OS indices that are older than 1 week.
2018-04-05 01:00:07,771 ERROR     Failed to complete action: delete_indices.  <type 'exceptions.KeyError'>: 'indices'

I will need DEBUG logs to see what's going on there. The default INFO is not sufficient.

Ok I'll get back to you when the Job runs and possibly fails again.

@theuntergeek here it is: https://www.dropbox.com/s/p6msekpvpmfn0s7/curator.stdout?dl=0

Okay, I am severely puzzled. There are only three places in the code which could result in the error you're seeing. Here is one. The second one is called here, and the last one is called here. One of these is causing the issue. What that means, however, is that you have an API call that is not returning the expected data.

It should not be possible for those API calls to not have the 'indices' key:

$ python3
Python 3.6.4 (default, Mar 26 2018, 19:37:24)
[GCC 4.2.1 Compatible Apple LLVM 9.0.0 (clang-900.0.39.2)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import elasticsearch
>>> client = elasticsearch.Elasticsearch()
>>> client.indices.create('foo')
{'acknowledged': True, 'shards_acknowledged': True}
>>> client.indices.stats(index='foo', metric='store,docs')
{'_shards': {'total': 10, 'successful': 5, 'failed': 0}, '_all': {'primaries': {'docs': {'count': 0, 'deleted': 0}, 'store': {'size_in_bytes': 650, 'throttle_time_in_millis': 0}}, 'total': {'docs': {'count': 0, 'deleted': 0}, 'store': {'size_in_bytes': 650, 'throttle_time_in_millis': 0}}}, 'indices': {'foo': {'primaries': {'docs': {'count': 0, 'deleted': 0}, 'store': {'size_in_bytes': 650, 'throttle_time_in_millis': 0}}, 'total': {'docs': {'count': 0, 'deleted': 0}, 'store': {'size_in_bytes': 650, 'throttle_time_in_millis': 0}}}}}
>>> client.indices.stats(index='foo', metric='store,docs').keys()
dict_keys(['_shards', '_all', 'indices'])
>>> client.indices.stats(index='foo', metric='store,docs')['indices']
{'foo': {'primaries': {'docs': {'count': 0, 'deleted': 0}, 'store': {'size_in_bytes': 650, 'throttle_time_in_millis': 0}}, 'total': {'docs': {'count': 0, 'deleted': 0}, 'store': {'size_in_bytes': 650, 'throttle_time_in_millis': 0}}}}
>>> client.cluster.state(index='foo', metric='metadata')['metadata']['indices']
{'foo': {'state': 'open', 'settings': {'index': {'creation_date': '1523285826585', 'number_of_shards': '5', 'number_of_replicas': '1', 'uuid': 'uSjJoCSPRFq4b9Mbyg8K5w', 'version': {'created': '5030099'}, 'provided_name': 'foo'}}, 'mappings': {}, 'aliases': [], 'primary_terms': {'0': 1, '1': 1, '2': 1, '3': 1, '4': 1}, 'in_sync_allocations': {'2': ['6D5FrZKeTiSRe9uwXAXrkQ'], '3': ['jHndJWgHSmO1XZdbg8eKUA'], '4': ['AzWYz7PiQo-H4D_wwUW-fQ'], '1': ['SOAQ7qAWSMWult-EDH6GvA'], '0': ['KsiZ37XlSDOUSH0BPz5NQg']}}}
>>> client.indices.segments(index='foo')['indices']
{'foo': {'shards': {'0': [{'routing': {'state': 'STARTED', 'primary': True, 'node': 'uNe7JnzBSDqbhbl7s-x_oQ'}, 'num_committed_segments': 0, 'num_search_segments': 0, 'segments': {}}], '1': [{'routing': {'state': 'STARTED', 'primary': True, 'node': 'uNe7JnzBSDqbhbl7s-x_oQ'}, 'num_committed_segments': 0, 'num_search_segments': 0, 'segments': {}}], '2': [{'routing': {'state': 'STARTED', 'primary': True, 'node': 'uNe7JnzBSDqbhbl7s-x_oQ'}, 'num_committed_segments': 0, 'num_search_segments': 0, 'segments': {}}], '3': [{'routing': {'state': 'STARTED', 'primary': True, 'node': 'uNe7JnzBSDqbhbl7s-x_oQ'}, 'num_committed_segments': 0, 'num_search_segments': 0, 'segments': {}}], '4': [{'routing': {'state': 'STARTED', 'primary': True, 'node': 'uNe7JnzBSDqbhbl7s-x_oQ'}, 'num_committed_segments': 0, 'num_search_segments': 0, 'segments': {}}]}}}

All that this is showing is that on a clean node, with no indices to start, I create an empty index, and still get the 'indices' key back, just like those calls are making. Something really weird is going on if any one of these API calls is not able to find the indices key. To be further clear that this is in unusual territory, this code is heavily tested by Travis CI across all supported minor releases of Elasticsearch with Curator running each of Python 2.7, 3.5, and 3.6 and I cannot make it fail in the way you are seeing. (And yes, I can see that you're on 6.2.3, and my latest iteration is on 6.2.2, but that level of API change should not exist in a patch release).

Is there something else about your cluster that I should know about? Where/how is it hosted? How is Curator installed?

Hi, thanks...

Actually I am on 6.2.2 (my bad)
I'm running curator inside docker on DC/OS and scheduling the job once a day using the metronome scheduler of DC/OS.

Docker file: https://www.dropbox.com/s/sbgrx5yvtu90pju/dockerfile-dev?dl=0
Conf file: https://www.dropbox.com/s/sb0205ufbetdrxb/curator-dev.yml?dl=0
Action file: https://www.dropbox.com/s/rc1k4to8srlm4ry/delete-dev.yml?dl=0

In the conf file I'm connecting to the elastic coordinator node over a load balanced address if that helps. Also like I said sometimes the job works and sometimes it fails.

The load balancer could have something to do with it, but it would be very hard to confirm that diagnosis. It would involve proving that the API call gets interrupted somehow when called across that load balanced IP. Is there any way to connect directly, without a load balanced IP?

The lb is a level 4 lb using linux VIPs.

But I can try a direct connection just to be sure... Also is it better to connect to the coordinator or the data node?

It won't matter, just so long as the connection is direct.

Ok using direct address and will let you know...

But, just curious...

Does a single action connect multiple times to a node? So when using load balanced VIP it may connect to a node that doesn't have the state it needs or I'm completely wrong here?

It would load balance connections, yes, which is exactly why this is so hard to trace. Since you're using Docker, I can't know for sure if it's just a delay in the cluster state, or something else. In other load balancing scenarios, it works okay, so long as all nodes are members of the same cluster (some users misunderstood that hosts was for load balancing and fail over, and thought they could get Curator to act on multiple clusters this way).

Ah ok, no, I'm on one cluster and the DC/OS lb is really just linux virtual IPs.

It connects once a completes a full action or it will connect multiple times per action?

It creates a new client connection for each action in an action file. This is how it is able to do the timeout_override.

Ok so I dont see how it could not work with lb... I'll let you how the direct test goes.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.