Curator delete_indices. <type 'exceptions.KeyError'>: 'indices'

javadevmtl · April 5, 2018, 2:44pm

Hi, running curator job and once a while I get the following error...

Failed to complete action: delete_indices. <type 'exceptions.KeyError'>: 'indices'

Otherwise it works.

Here is the action...

actions:
  2:
    action: delete_indices
    description: Delete indices that are older than 1 week.
    options:
      timeout_override: 300
      ignore_empty_list: True
    filters:
    - filtertype: pattern
      kind: prefix
      value: foo-
    - filtertype: age
      source: name
      direction: older
      timestring: '%Y.%m.%d'
      unit: weeks
      unit_count: 1

theuntergeek · April 5, 2018, 4:39pm

What version of Curator?

What version of Elasticsearch?

What other log lines can you provide (DEBUG level will definitely be more informative).

javadevmtl · April 5, 2018, 6:05pm

Hi,

Curator 5.5
Elastic 6.2.3

Full logs:

2018-04-05 01:00:01,952 INFO      Preparing Action ID: 2, "delete_indices"
2018-04-05 01:00:01,967 INFO      Trying Action ID: 2, "delete_indices": Delete DC/OS indices that are older than 1 week.
2018-04-05 01:00:07,771 ERROR     Failed to complete action: delete_indices.  <type 'exceptions.KeyError'>: 'indices'

theuntergeek · April 5, 2018, 7:08pm

I will need DEBUG logs to see what's going on there. The default INFO is not sufficient.

javadevmtl · April 5, 2018, 8:01pm

Ok I'll get back to you when the Job runs and possibly fails again.

javadevmtl · April 9, 2018, 2:04pm

@theuntergeek here it is: https://www.dropbox.com/s/p6msekpvpmfn0s7/curator.stdout?dl=0

theuntergeek · April 9, 2018, 3:09pm

Okay, I am severely puzzled. There are only three places in the code which could result in the error you're seeing. Here is one. The second one is called here, and the last one is called here. One of these is causing the issue. What that means, however, is that you have an API call that is not returning the expected data.

It should not be possible for those API calls to not have the 'indices' key:

$ python3
Python 3.6.4 (default, Mar 26 2018, 19:37:24)
[GCC 4.2.1 Compatible Apple LLVM 9.0.0 (clang-900.0.39.2)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import elasticsearch
>>> client = elasticsearch.Elasticsearch()
>>> client.indices.create('foo')
{'acknowledged': True, 'shards_acknowledged': True}
>>> client.indices.stats(index='foo', metric='store,docs')
{'_shards': {'total': 10, 'successful': 5, 'failed': 0}, '_all': {'primaries': {'docs': {'count': 0, 'deleted': 0}, 'store': {'size_in_bytes': 650, 'throttle_time_in_millis': 0}}, 'total': {'docs': {'count': 0, 'deleted': 0}, 'store': {'size_in_bytes': 650, 'throttle_time_in_millis': 0}}}, 'indices': {'foo': {'primaries': {'docs': {'count': 0, 'deleted': 0}, 'store': {'size_in_bytes': 650, 'throttle_time_in_millis': 0}}, 'total': {'docs': {'count': 0, 'deleted': 0}, 'store': {'size_in_bytes': 650, 'throttle_time_in_millis': 0}}}}}
>>> client.indices.stats(index='foo', metric='store,docs').keys()
dict_keys(['_shards', '_all', 'indices'])
>>> client.indices.stats(index='foo', metric='store,docs')['indices']
{'foo': {'primaries': {'docs': {'count': 0, 'deleted': 0}, 'store': {'size_in_bytes': 650, 'throttle_time_in_millis': 0}}, 'total': {'docs': {'count': 0, 'deleted': 0}, 'store': {'size_in_bytes': 650, 'throttle_time_in_millis': 0}}}}
>>> client.cluster.state(index='foo', metric='metadata')['metadata']['indices']
{'foo': {'state': 'open', 'settings': {'index': {'creation_date': '1523285826585', 'number_of_shards': '5', 'number_of_replicas': '1', 'uuid': 'uSjJoCSPRFq4b9Mbyg8K5w', 'version': {'created': '5030099'}, 'provided_name': 'foo'}}, 'mappings': {}, 'aliases': [], 'primary_terms': {'0': 1, '1': 1, '2': 1, '3': 1, '4': 1}, 'in_sync_allocations': {'2': ['6D5FrZKeTiSRe9uwXAXrkQ'], '3': ['jHndJWgHSmO1XZdbg8eKUA'], '4': ['AzWYz7PiQo-H4D_wwUW-fQ'], '1': ['SOAQ7qAWSMWult-EDH6GvA'], '0': ['KsiZ37XlSDOUSH0BPz5NQg']}}}
>>> client.indices.segments(index='foo')['indices']
{'foo': {'shards': {'0': [{'routing': {'state': 'STARTED', 'primary': True, 'node': 'uNe7JnzBSDqbhbl7s-x_oQ'}, 'num_committed_segments': 0, 'num_search_segments': 0, 'segments': {}}], '1': [{'routing': {'state': 'STARTED', 'primary': True, 'node': 'uNe7JnzBSDqbhbl7s-x_oQ'}, 'num_committed_segments': 0, 'num_search_segments': 0, 'segments': {}}], '2': [{'routing': {'state': 'STARTED', 'primary': True, 'node': 'uNe7JnzBSDqbhbl7s-x_oQ'}, 'num_committed_segments': 0, 'num_search_segments': 0, 'segments': {}}], '3': [{'routing': {'state': 'STARTED', 'primary': True, 'node': 'uNe7JnzBSDqbhbl7s-x_oQ'}, 'num_committed_segments': 0, 'num_search_segments': 0, 'segments': {}}], '4': [{'routing': {'state': 'STARTED', 'primary': True, 'node': 'uNe7JnzBSDqbhbl7s-x_oQ'}, 'num_committed_segments': 0, 'num_search_segments': 0, 'segments': {}}]}}}

All that this is showing is that on a clean node, with no indices to start, I create an empty index, and still get the 'indices' key back, just like those calls are making. Something really weird is going on if any one of these API calls is not able to find the indices key. To be further clear that this is in unusual territory, this code is heavily tested by Travis CI across all supported minor releases of Elasticsearch with Curator running each of Python 2.7, 3.5, and 3.6 and I cannot make it fail in the way you are seeing. (And yes, I can see that you're on 6.2.3, and my latest iteration is on 6.2.2, but that level of API change should not exist in a patch release).

Is there something else about your cluster that I should know about? Where/how is it hosted? How is Curator installed?

javadevmtl · April 9, 2018, 3:22pm

Hi, thanks...

Actually I am on 6.2.2 (my bad)
I'm running curator inside docker on DC/OS and scheduling the job once a day using the metronome scheduler of DC/OS.

Docker file: https://www.dropbox.com/s/sbgrx5yvtu90pju/dockerfile-dev?dl=0
Conf file: https://www.dropbox.com/s/sb0205ufbetdrxb/curator-dev.yml?dl=0
Action file: https://www.dropbox.com/s/rc1k4to8srlm4ry/delete-dev.yml?dl=0

In the conf file I'm connecting to the elastic coordinator node over a load balanced address if that helps. Also like I said sometimes the job works and sometimes it fails.

theuntergeek · April 9, 2018, 3:31pm

The load balancer could have something to do with it, but it would be very hard to confirm that diagnosis. It would involve proving that the API call gets interrupted somehow when called across that load balanced IP. Is there any way to connect directly, without a load balanced IP?

javadevmtl · April 9, 2018, 3:46pm

The lb is a level 4 lb using linux VIPs.

But I can try a direct connection just to be sure... Also is it better to connect to the coordinator or the data node?

theuntergeek · April 9, 2018, 3:58pm

It won't matter, just so long as the connection is direct.

javadevmtl · April 9, 2018, 6:06pm

Ok using direct address and will let you know...

But, just curious...

Does a single action connect multiple times to a node? So when using load balanced VIP it may connect to a node that doesn't have the state it needs or I'm completely wrong here?

theuntergeek · April 9, 2018, 6:21pm

It would load balance connections, yes, which is exactly why this is so hard to trace. Since you're using Docker, I can't know for sure if it's just a delay in the cluster state, or something else. In other load balancing scenarios, it works okay, so long as all nodes are members of the same cluster (some users misunderstood that hosts was for load balancing and fail over, and thought they could get Curator to act on multiple clusters this way).

javadevmtl · April 9, 2018, 7:32pm

Ah ok, no, I'm on one cluster and the DC/OS lb is really just linux virtual IPs.

javadevmtl · April 10, 2018, 4:43am

It connects once a completes a full action or it will connect multiple times per action?

theuntergeek · April 10, 2018, 2:03pm

It creates a new client connection for each action in an action file. This is how it is able to do the timeout_override.

javadevmtl · April 10, 2018, 3:21pm

Ok so I dont see how it could not work with lb... I'll let you how the direct test goes.

system · May 8, 2018, 3:21pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Curator delete_indices fail Elasticsearch	6	3244	July 10, 2018
Elasticsearch Curator error Elasticsearch curator	7	1003	March 17, 2020
Unable to delete indices using curator Elasticsearch	9	3359	March 27, 2018
Curator no indices found Elasticsearch	14	1809	July 29, 2019
While using curator for bulk size elasticsearch data. curator throw error Elasticsearch curator	4	546	June 17, 2020

Curator delete_indices. <type 'exceptions.KeyError'>: 'indices'

Related topics