Failed to complete action: delete_indices. <class 'KeyError'>: 'indices'

Hello.

First of all thanks to the community who helped me so far with the forum and documentation, to use Elasticsearch & co. I now use Curator as well and I have quite a severe issue I can't resolve.

Elasticsearch: 6.1.2
Curator: 5.8.1

Error message:

2019-12-09 16:23:52,727 INFO      Preparing Action ID: 1, "delete_indices"
2019-12-09 16:23:52,727 INFO      Creating client object and testing connection
2019-12-09 16:23:52,731 INFO      Instantiating client object
2019-12-09 16:23:52,731 INFO      Testing client connectivity
2019-12-09 16:23:52,762 INFO      Successfully created Elasticsearch client object with provided settings
2019-12-09 16:23:52,778 INFO      Trying Action ID: 1, "delete_indices": Delete too old indices test-*
2019-12-09 16:23:56,784 ERROR     Failed to complete action: delete_indices.  <class 'KeyError'>: 'indices'
2019-12-09 16:23:56,784 INFO      Continuing execution with next action because "continue_if_exception" is set to True for action delete_indices
2019-12-09 16:23:56,784 INFO      Action ID: 1, "delete_indices" completed.
2019-12-09 16:23:56,784 INFO      Job completed.

I am stucked, I really do not understand why it does not work in production and what this error means in my context. I thought it was maybe related to some name conflicts among the index names that's why I tried:

  • to create the same indexes (+8000) in DEV & UAT, and it works fine, clean up is done.
  • a minimal configuration (which does not mean anything, it tries to delete test-* indexes which do not exist) without date pattern and I still have the same error (see next post).

Of course cluster is green and I have no specific hidden setup.
Thank you for your help.

Vincent.

2019-12-09 16:25:10,646 DEBUG                curator.cli                    run:110  Client and logging options validated.
2019-12-09 16:25:10,646 DEBUG                curator.cli                    run:114  default_timeout = 30
2019-12-09 16:25:10,646 DEBUG                curator.cli                    run:118  action_file: test.yml
2019-12-09 16:25:10,646 DEBUG                curator.cli                    run:120  action_config: {'actions': {1: {'action': 'delete_indices', 'description': 'Delete too old indices test-*', 'options': {'ignore_empty_list': True, 'timeout_override': None, 'continue_if_exception': True, 'disable_action': False}, 'filters': [{'filtertype': 'pattern', 'kind': 'prefix', 'value': 'test-'}]}}}
2019-12-09 16:25:10,646 DEBUG     curator.validators.SchemaCheck               __init__:26   Schema: {'actions': <class 'dict'>}
2019-12-09 16:25:10,646 DEBUG     curator.validators.SchemaCheck               __init__:27   "Actions File" config: {'actions': {1: {'action': 'delete_indices', 'description': 'Delete too old indices test-*', 'options': {'ignore_empty_list': True, 'timeout_override': None, 'continue_if_exception': True, 'disable_action': False}, 'filters': [{'filtertype': 'pattern', 'kind': 'prefix', 'value': 'test-'}]}}}
2019-12-09 16:25:10,646 DEBUG     curator.validators.SchemaCheck               __init__:26   Schema: {'action': Any(In(['alias', 'allocation', 'close', 'cluster_routing', 'create_index', 'delete_indices', 'delete_snapshots', 'forcemerge', 'freeze', 'index_settings', 'open', 'reindex', 'replicas', 'restore', 'rollover', 'shrink', 'snapshot', 'unfreeze']), msg="action must be one of ['alias', 'allocation', 'close', 'cluster_routing', 'create_index', 'delete_indices', 'delete_snapshots', 'forcemerge', 'freeze', 'index_settings', 'open', 'reindex', 'replicas', 'restore', 'rollover', 'shrink', 'snapshot', 'unfreeze']")}
2019-12-09 16:25:10,646 DEBUG     curator.validators.SchemaCheck               __init__:27   "action type" config: {'action': 'delete_indices', 'description': 'Delete too old indices test-*', 'options': {'ignore_empty_list': True, 'timeout_override': None, 'continue_if_exception': True, 'disable_action': False}, 'filters': [{'filtertype': 'pattern', 'kind': 'prefix', 'value': 'test-'}]}
2019-12-09 16:25:10,646 DEBUG     curator.validators.SchemaCheck               __init__:26   Schema: {'action': Any(In(['alias', 'allocation', 'close', 'cluster_routing', 'create_index', 'delete_indices', 'delete_snapshots', 'forcemerge', 'freeze', 'index_settings', 'open', 'reindex', 'replicas', 'restore', 'rollover', 'shrink', 'snapshot', 'unfreeze']), msg="action must be one of ['alias', 'allocation', 'close', 'cluster_routing', 'create_index', 'delete_indices', 'delete_snapshots', 'forcemerge', 'freeze', 'index_settings', 'open', 'reindex', 'replicas', 'restore', 'rollover', 'shrink', 'snapshot', 'unfreeze']"), 'description': Any(<class 'str'>, <class 'str'>, msg=None), 'options': <class 'dict'>, 'filters': <class 'list'>}
2019-12-09 16:25:10,646 DEBUG     curator.validators.SchemaCheck               __init__:27   "structure" config: {'action': 'delete_indices', 'description': 'Delete too old indices test-*', 'options': {'ignore_empty_list': True, 'timeout_override': None, 'continue_if_exception': True, 'disable_action': False}, 'filters': [{'filtertype': 'pattern', 'kind': 'prefix', 'value': 'test-'}]}
2019-12-09 16:25:10,646 DEBUG     curator.validators.SchemaCheck               __init__:26   Schema: {'allow_ilm_indices': Any(<class 'bool'>, All(Any(<class 'str'>, msg=None), <function Boolean at 0x000000A5F2BEF5E8>, msg=None), msg=None), 'continue_if_exception': Any(<class 'bool'>, All(Any(<class 'str'>, msg=None), <function Boolean at 0x000000A5F2BEF798>, msg=None), msg=None), 'disable_action': Any(<class 'bool'>, All(Any(<class 'str'>, msg=None), <function Boolean at 0x000000A5F2BEF948>, msg=None), msg=None), 'ignore_empty_list': Any(<class 'bool'>, All(Any(<class 'str'>, msg=None), <function Boolean at 0x000000A5F2BEFAF8>, msg=None), msg=None), 'timeout_override': Any(Coerce(int, msg=None), None, msg=None)}
2019-12-09 16:25:10,646 DEBUG     curator.validators.SchemaCheck               __init__:27   "options" config: {'ignore_empty_list': True, 'continue_if_exception': True, 'disable_action': False}
2019-12-09 16:25:10,646 DEBUG     curator.validators.SchemaCheck               __init__:26   Schema: <function Filters.<locals>.f at 0x000000A5F2BDB558>
2019-12-09 16:25:10,646 DEBUG     curator.validators.SchemaCheck               __init__:27   "filters" config: [{'filtertype': 'pattern', 'kind': 'prefix', 'value': 'test-'}]
2019-12-09 16:25:10,646 DEBUG     curator.validators.SchemaCheck               __init__:26   Schema: {'filtertype': Any(In(['age', 'alias', 'allocated', 'closed', 'count', 'empty', 'forcemerged', 'ilm', 'kibana', 'none', 'opened', 'pattern', 'period', 'shards', 'space', 'state']), msg="filtertype must be one of ['age', 'alias', 'allocated', 'closed', 'count', 'empty', 'forcemerged', 'ilm', 'kibana', 'none', 'opened', 'pattern', 'period', 'shards', 'space', 'state']"), 'kind': Any('prefix', 'suffix', 'timestring', 'regex', msg=None), 'value': Any(<class 'str'>, msg=None), 'exclude': Any(<class 'bool'>, All(Any(<class 'str'>, msg=None), <function Boolean at 0x000000A5F2BDBD38>, msg=None), msg=None)}
2019-12-09 16:25:10,646 DEBUG     curator.validators.SchemaCheck               __init__:27   "filter" config: {'filtertype': 'pattern', 'kind': 'prefix', 'value': 'test-'}
2019-12-09 16:25:10,646 DEBUG     curator.validators.filters                      f:48   Filter #0: {'filtertype': 'pattern', 'kind': 'prefix', 'value': 'test-', 'exclude': False}
2019-12-09 16:25:10,646 DEBUG                curator.cli                    run:123  Full list of actions: {1: {'action': 'delete_indices', 'description': 'Delete too old indices test-*', 'options': {'ignore_empty_list': True, 'continue_if_exception': True, 'disable_action': False, 'timeout_override': None, 'allow_ilm_indices': False}, 'filters': [{'filtertype': 'pattern', 'kind': 'prefix', 'value': 'test-', 'exclude': False}]}}
2019-12-09 16:25:10,646 DEBUG                curator.cli                    run:128  action_disabled = False
2019-12-09 16:25:10,662 DEBUG                curator.cli                    run:132  continue_if_exception = True
2019-12-09 16:25:10,662 DEBUG                curator.cli                    run:134  timeout_override = None
2019-12-09 16:25:10,662 DEBUG                curator.cli                    run:136  ignore_empty_list = True
2019-12-09 16:25:10,662 DEBUG                curator.cli                    run:138  allow_ilm_indices = False
2019-12-09 16:25:10,662 INFO                 curator.cli                    run:148  Preparing Action ID: 1, "delete_indices"
2019-12-09 16:25:10,662 INFO                 curator.cli                    run:162  Creating client object and testing connection
2019-12-09 16:25:10,662 DEBUG              curator.utils             get_client:809  kwargs = {'hosts': ['sdpprdap014.fr.world.socgen', 'sdpprdap016.fr.world.socgen', 'sdpprdap018.fr.world.socgen'], 'port': 9200, 'use_ssl': False, 'ssl_no_validate': False, 'master_only': False, 'certificate': None, 'aws_token': None, 'aws_key': None, 'client_key': None, 'aws_secret_key': None, 'http_auth': None, 'aws_sign_request': False, 'client_cert': None, 'url_prefix': '', 'timeout': 30}
2019-12-09 16:25:10,662 DEBUG              curator.utils             get_client:871  Checking for AWS settings
2019-12-09 16:25:10,662 DEBUG              curator.utils             get_client:886  "requests_aws4auth" module present, but not used.
2019-12-09 16:25:10,662 INFO               curator.utils             get_client:903  Instantiating client object
2019-12-09 16:25:10,662 INFO               curator.utils             get_client:906  Testing client connectivity
2019-12-09 16:25:10,678 DEBUG              curator.utils             get_client:907  Cluster info: {'name': 'SDPPRDAP018', 'cluster_name': 'SGME-PROD', 'cluster_uuid': 'icxT5QVHQUqidPvqDBu-OQ', 'version': {'number': '6.1.2', 'build_hash': '5b1fea5', 'build_date': '2018-01-10T02:35:59.208Z', 'build_snapshot': False, 'lucene_version': '7.1.0', 'minimum_wire_compatibility_version': '5.6.0', 'minimum_index_compatibility_version': '5.0.0'}, 'tagline': 'You Know, for Search'}
2019-12-09 16:25:10,678 INFO               curator.utils             get_client:908  Successfully created Elasticsearch client object with provided settings
2019-12-09 16:25:10,678 DEBUG              curator.utils             get_client:932  Checking Elasticsearch endpoint version...
2019-12-09 16:25:10,693 DEBUG              curator.utils          check_version:693  Detected Elasticsearch version 6.1.2
2019-12-09 16:25:10,693 DEBUG              curator.utils             get_client:951  Not verifying local master status (master_only: false)
2019-12-09 16:25:10,693 INFO                 curator.cli                    run:194  Trying Action ID: 1, "delete_indices": Delete too old indices test-*
2019-12-09 16:25:10,693 DEBUG                curator.cli         process_action:46   Configuration dictionary: {'action': 'delete_indices', 'description': 'Delete too old indices test-*', 'options': {}, 'filters': [{'filtertype': 'pattern', 'kind': 'prefix', 'value': 'test-', 'exclude': False}, {'filtertype': 'ilm'}]}
2019-12-09 16:25:10,693 DEBUG                curator.cli         process_action:47   kwargs: {'master_timeout': 30, 'dry_run': False}
2019-12-09 16:25:10,693 DEBUG                curator.cli         process_action:52   opts: {}
2019-12-09 16:25:10,693 DEBUG                curator.cli         process_action:64   Action kwargs: {'master_timeout': 30}
2019-12-09 16:25:10,693 DEBUG                curator.cli         process_action:93   Running "DELETE_INDICES"
2019-12-09 16:25:10,693 DEBUG          curator.indexlist          __get_indices:65   Getting all indices
2019-12-09 16:25:10,943 DEBUG              curator.utils            get_indices:648  Detected Elasticsearch version 6.1.2
2019-12-09 16:25:10,943 DEBUG              curator.utils            get_indices:650  All indices: ['metricbeat-6.1.1-2019.11.12',..., 'flow-fxct-2019.11.10']
2019-12-09 16:25:10,943 DEBUG          curator.indexlist     __build_index_info:80   Building preliminary index metadata for metricbeat-6.1.1-2019.11.12
...
2019-12-09 16:25:11,068 DEBUG          curator.indexlist     __build_index_info:80   Building preliminary index metadata for flow-fxct-2019.11.10
2019-12-09 16:25:11,068 DEBUG          curator.indexlist          _get_metadata:177  Getting index metadata
2019-12-09 16:25:11,068 DEBUG          curator.indexlist       empty_list_check:226  Checking for empty list
2019-12-09 16:25:21,830 DEBUG          curator.indexlist       _get_index_stats:117  Getting index stats
2019-12-09 16:25:21,830 DEBUG          curator.indexlist       empty_list_check:226  Checking for empty list
2019-12-09 16:25:21,830 DEBUG          curator.indexlist           working_list:237  Generating working list of indices
2019-12-09 16:25:21,830 DEBUG          curator.indexlist           working_list:237  Generating working list of indices
2019-12-09 16:25:21,846 ERROR                curator.cli                    run:213  Failed to complete action: delete_indices.  <class 'KeyError'>: 'indices'
2019-12-09 16:25:21,846 INFO                 curator.cli                    run:219  Continuing execution with next action because "continue_if_exception" is set to True for action delete_indices
2019-12-09 16:25:21,846 INFO                 curator.cli                    run:223  Action ID: 1, "delete_indices" completed.
2019-12-09 16:25:21,846 INFO                 curator.cli                    run:224  Job completed.

So, you've included the DEBUG logs, but not the configuration details and YAML. What version of Elasticsearch are you using? How old is the oldest index? When an index was created, and under what version of Elasticsearch might have something to do with this message. To me, without further context, the error message is implying that Curator expects a given key (indices) to be in a response, and that response does not contain that key. This can happen with older indices, older versions of Elasticsearch, the permissions granted to the user connecting may be insufficient.

I can't help much, however, until I at least see the YAML of the action file.

Thanks @theuntergeek
Version is 6.1.2.
No permission set at all, everybody can read and write.
I will try to gather all other relevant info as soon as possible.

Here the minimal action file (used for test purpose and which also cause the failure which make me think it is not related to the action file at all):

# Actions for test-* indices

  1:
    action: delete_indices
    description: >-
      Delete too old indices test-*
    options:
      ignore_empty_list: True
      timeout_override:
      continue_if_exception: True
      disable_action: False
    filters:
    - filtertype: pattern
      kind: prefix
      value: test-

And the YAML file:

# Remember, leave a key empty if there is no value.  None will be a string,
# not a Python "NoneType"
client:
  hosts:
    - sdpprdap014.fr.world.socgen
    - sdpprdap016.fr.world.socgen
    - sdpprdap018.fr.world.socgen
  port: 9200
  url_prefix:
  use_ssl: False
  certificate:
  client_cert:
  client_key:
  ssl_no_validate: False
  http_auth:
  timeout: 30
  master_only: False

logging:
  loglevel: DEBUG
  logfile:
  logformat: default
  blacklist: ['elasticsearch', 'urllib3']

How can I test your idea about a missing key (indices)? If I can find the bad index would be great.

About the oldest indexes

Request
curl -XGET "http://sdpprdap014:9200/_cat/indices?h=h,s,i,id,p,r,dc,dd,ss,creation.date.string"

Responses (after sort)

green open .kibana                      6OH2lyuQSNGEKPXZn8sbkg 2 2       886      10      5mb 2018-02-22T08:49:02.073Z
green open connections-mm-2018          4p0qgtRRRK2OSzhdouq2GQ 2 2      4972     125      2mb 2018-04-25T21:00:34.236Z
green open connections-tca-2018         0BvlQdAgRzCJz34MzTnh7g 2 2      1837      31      9mb 2018-04-25T21:00:53.208Z
green open connections-flow-2018        gP_v5gGfQJmwLcQ9jtVkQQ 2 2     26295     136    7.9mb 2018-04-26T21:01:54.152Z
green open events-flow-2018             imCL4QbiSeCdbGsmzMGiVg 2 2    131970    1401    399mb 2018-04-26T21:02:05.943Z
green open events-fic-2018              7WWVBKWYTSCKeqWiQvwPZg 2 2      6371      46   12.3mb 2018-04-27T21:01:19.435Z
green open connections-fx-2018          5oG_YWCGRwOa4akr5qY2Dg 2 2     25936     356   31.7mb 2018-05-04T21:01:55.273Z
green open connections-sp-2018          aBuN02hiSe6OREk0cTs_ZQ 2 2    185896     899  150.1mb 2018-05-04T21:02:32.224Z
green open events-sp-2018               TlVlp3iFT4W1SKKX3_KP3A 2 2   1408231   21873    3.6gb 2018-05-04T21:02:44.294Z
green open connections-cty-2018         xEadDg8qSVywlnSgvr1DEQ 2 2     61086     514  111.2mb 2018-05-14T21:00:27.002Z
...

Thanks.

How many total indices do you have? Seeing you have data back into 2018, the number could be quite high. You may, somehow, have tickled a bug based on how Curator has to split very long lists of indices. The only hint I have at this moment is that

is called twice in a row. This may yet be a red-herring, however, as the call that is made multiple times during the course of a Curator run.

In the meanwhile, it will be a lot more data, but we'll get more complete debug information if you set

blacklist: 

(i.e., an empty line). This will show us what calls are being made by both the elasticsearch and urllib3 modules. It's a lot of data, but we should at least have the list of indices it's trying to delete, and failing to delete, plus the response call that has the missing key (hopefully).

Thanks for the tip with the empty blacklist parameter. Now I understand where is the problem. It is because of the request:

curl -XGET "http://sdpprdap016.fr.world.socgen:9200/.kibana,connections-fic-2019.11.25,connections-fic-2019.12.02,connections-flow-2018,connections-flow-2019.11.25,connections-flow-2019.11.29,connections-fx-20191004,connections-fx-20191005,connections-fx-20191009,connections-fx-20191030,connections-fx-20191113,connections-fx-20191208,connections-lisa-20191202,connections-mm-2018,connections-mm-20191007,connections-mm-20191018,connections-mm-20191022,connections-mm-20191107,connections-mm-20191126,connections-phoenix-20191123,connections-sp-2019.11.25,elastalert_status,events-eqdflow-20191030,events-eqdflow-20191101,events-eqdflow-20191210,events-flow-2019.11.26,events-flow-2019.12.05,events-flow-2019.12.10,events-fx-20191017,events-fx-20191020,events-fx-20191021,events-fx-20191022,events-fx-20191209,events-sp-2019.12.09,flow-fic-2019.11.04,flow-fic-2019.11.09,flow-fic-2019.11.15,flow-fic-2019.12.09,flow-fic-2019.12.10,flow-flow-2019.11.06,flow-flow-2019.11.17,flow-flow-2019.12.08,flow-flow-2019.12.10,flow-fx-2019.11.16,flow-fx-2019.11.27,flow-fx-2019.11.30,flow-fxct-2019.11.02,flow-fxct-2019.11.23,flow-fxct-2019.12.11,flow-mm-2019.11.07,flow-mm-2019.11.19,flow-myhedge-2019.11.04,flow-myhedge-2019.11.05,flow-myhedge-2019.11.13,flow-phoenix-2019.11.09,flow-phoenix-2019.11.25,flow-phoenix-2019.12.05,flow-phoenix-2019.12.10,flow-sp-2019.11.19,flow-sp-2019.12.06,iis-2019.11.12,inout-eqdflow-20191101,inout-eqdflow-20191104,inout-eqdflow-20191118,inout-eqdflow-20191130,inout-fic-20191119,inout-fic-20191203,inout-fx-20191101,inout-fx-20191117,inout-fx-20191210,inout-lisa-20191110,inout-lisa-20191205,inout-lisa-20191210,inout-lisa-20191211,inout-mm-20191120,inout-mm-20191125,inout-mm-20191126,inout-mm-20191202,inout-phoenix-20191102,inout-phoenix-20191202,inout-sp-20191109,inout-sp-20191110,inout-sp-20191116,inout-sp-20191123,logs-ctyw-2019.11.10,logs-ctyw-2019.11.22,logs-ctyw-2019.12.03,logs-fic-20191121,logs-fic-20191122,logs-fic-20191126,logs-fic-20191128,logs-flow-2019.11.14,logs-flow-2019.11.16,logs-flow-2019.11.19,logs-flow-2019.11.21,logs-flow-2019.11.24,logs-flow-2019.11.29,logs-flow-2019.12.09,logs-fx-2019.11.22,logs-fx-20191113,logs-fxct-2019.11.02,logs-fxct-2019.11.04,logs-fxct-2019.11.05,logs-fxct-2019.11.06,logs-fxct-2019.11.10,logs-fxct-2019.12.01,logs-fxct-2019.12.09,logs-lisa-20191118,logs-lisa-20191130,logs-lisa-20191206,logs-lisa-20191208,logs-mm-2019.12.01,logs-mm-2019.12.07,logs-mm-2019.12.11,logs-mm-20191109,logs-mm-20191116,logs-mm-20191118,logs-mm-20191204,logs-myhedge-2019.11.14,logs-myhedge-2019.11.18,logs-myhedge-2019.12.10,logs-phoenix-2019.11.02,logs-phoenix-2019.11.22,logs-phoenix-2019.11.26,logs-phoenix-20191107,logs-phoenix-20191121,logs-phoenix-20191124,logs-phoenix-20191129,logs-phoenix-20191202,logs-sp-20191104,logs-sp-20191107,logs-sp-20191120,logs-sp-20191130,logs-sp-20191202,logs-sp-20191204,logs-sp-20191207,metricbeat-6.1.1-2019.12.07,metricbeat-6.1.1-2019.12.11,metricbeat-6.1.2-2019.11.03,metricbeat-6.1.2-2019.11.04,metricbeat-6.1.2-2019.11.15,metricbeat-6.1.2-2019.11.24,nirvana-2019.11.13,nirvana-2019.11.15,nirvana-2019.11.21,nirvana-2019.11.26/_stats/store,docs"

It throws an exception. And if I request it from curl it is also rejected:

* Recv failure: Connection was aborted
* Closing connection 0
curl: (56) Recv failure: Connection was aborted

I did 146 requests with curl, one by index, and it works. So for some reason Elasticsearch abort the connection maybe because it thinks it will be too heavy to respond... Is there a way to modify the size of the batch on Curator side (with less indexes at a time for the _stats/store,docs) or any other workaround you have in mind?

Thanks.

I tried different stuff with different indexes in the request. So far my only guess is there is some limitation in the length of the indexes string we can sent to ELS, but really not sure about that.

This one works:

curl -XGET "http://sdpprdap016.fr.world.socgen:9200/logs-ctyw-2019.11.10,logs-ctyw-2019.11.22,logs-ctyw-2019.12.03,logs-fic-20191121,logs-fic-20191122,logs-fic-20191126,logs-fic-20191128,logs-flow-2019.11.14,logs-flow-2019.11.16,logs-flow-2019.11.19,logs-flow-2019.11.21,logs-flow-2019.11.24,logs-flow-2019.11.29,logs-flow-2019.12.09,logs-fx-2019.11.22,logs-fx-20191113,logs-fxct-2019.11.02,logs-fxct-2019.11.04,logs-fxct-2019.11.05,logs-fxct-2019.11.06,logs-fxct-2019.11.10,logs-fxct-2019.12.01,logs-fxct-2019.12.09,logs-lisa-20191118,.kibana,.kibana/_stats/store,docs"

This one works (and returns 404 because last index does not exist):

curl -XGET "http://sdpprdap016.fr.world.socgen:9200/logs-ctyw-2019.11.10,logs-ctyw-2019.11.22,logs-ctyw-2019.12.03,logs-fic-20191121,logs-fic-20191122,logs-fic-20191126,logs-fic-20191128,logs-flow-2019.11.14,logs-flow-2019.11.16,logs-flow-2019.11.19,logs-flow-2019.11.21,logs-flow-2019.11.24,logs-flow-2019.11.29,logs-flow-2019.12.09,logs-fx-2019.11.22,logs-fx-20191113,logs-fxct-2019.11.02,logs-fxct-2019.11.04,logs-fxct-2019.11.05,logs-fxct-2019.11.06,logs-fxct-2019.11.10,logs-fxct-2019.12.01,logs-fxct-2019.12.09,logs-lisa-20191118,.kibana,.kibana,.ki/_stats/store,docs"

And with one more character this one does not work anymore (connection aborted):

curl -XGET "http://sdpprdap016.fr.world.socgen:9200/logs-ctyw-2019.11.10,logs-ctyw-2019.11.22,logs-ctyw-2019.12.03,logs-fic-20191121,logs-fic-20191122,logs-fic-20191126,logs-fic-20191128,logs-flow-2019.11.14,logs-flow-2019.11.16,logs-flow-2019.11.19,logs-flow-2019.11.21,logs-flow-2019.11.24,logs-flow-2019.11.29,logs-flow-2019.12.09,logs-fx-2019.11.22,logs-fx-20191113,logs-fxct-2019.11.02,logs-fxct-2019.11.04,logs-fxct-2019.11.05,logs-fxct-2019.11.06,logs-fxct-2019.11.10,logs-fxct-2019.12.01,logs-fxct-2019.12.09,logs-lisa-20191118,.kibana,.kibana,.kib/_stats/store,docs"

Or maybe I'm just totally confused and this is some coincidence of something else. Like ELS is able to guess the load of a request based on the content of it of something fancier.

Out of the box, Elasticsearch has a 4K character limit for requests. When there are many, many indices, Curator is designed to work within that framework by batching indices into roughly 3K requests. I piped your curl request through wc -c and saw only 3138 characters, meaning that Curator is doing what it’s supposed to do.

So, what does it mean? Perhaps the network stack on the machine hosting Elasticsearch has a limit of its own, which is dramatically less than Elasticsearch’s 4K limit. Perhaps someone configured Elasticsearch to have a smaller limit (it can be done with settings in elasticsearch.yml). I’m not sure. That is a guess. You could test this theory by shortening this list of indices by degrees until the curl call returns, and count how many characters it is. UPDATE: You did this while I was typing this response. You might check the network parameters on the machine itself. It might be some internal firewall rule, or something like that. Regardless, it appears that Curator is behaving as it was written. The 3K limit is not user configurable, unfortunately. I had not anticipated this sort of limit, since Elasticsearch isn’t limited to such a short list of indices natively.

Perhaps you could use the count filter or extra pattern filters, or absolute date range filtering to reduce the size of requests until you find whether this is configurable to permit larger requests.

That turns out to be 563 characters. Seems your limit is 562, then.

Thanks for your help. I have a Kibana instance running on the same server and definitly it can handle requests with more than 562b.

I tried some weird requests to have a clue. All the following work, and stop working if I had another "x". If I try with the servername without the domain, I don't get extra characters.

curl -XGET "http://sdpprdap016.fr.world.socgen:9200/xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/_stats/store,docs"
curl -XGET "http://sdpprdap016.fr.world.socgen:9200/xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/_stats"
curl -XGET "http://sdpprdap016.fr.world.socgen:9200/xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/_stats/"

So my guess is the limitation is really on the ELS and not on the server. But honestly I don't get it yet. I will also think about more filter pattern, but not easy.

Sorry, I misspoke. The http://, host and port don't count towards the character count. It's everything after the first / that counts, which makes your limit 522 characters. What's ELS here? I don't recognize the initials from anything, though I could be missing the obvious. As stated, Elasticsearch itself will not have a problem with any requests up to 4K characters. It's something in between the server running Curator and your Elasticsearch machine.

What happens if you make one of these curl calls on the Elasticsearch machine to itself?

Sorry ELS is the short name I use for ELasticSearch.

Ok not to count the protocol and so on but just the characters after the /, but with my 3 examples the limit is never the same, it is 521, 517, 511.

I did the request on the host (sdpprdap016) and it worked! Which is interesting, but I don't know what to conclude yet :slight_smile:

Definitly this is a reset from Elasticsearch itself.

From any other host this request does not work

curl -XGET     "http://sdpprdap016.fr.world.socgen:9200/xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/_stats/store,docs"
curl: (56) Recv failure: Connection reset by peer

But this request (on the Kibana port actually) from any other host works fine

curl -XGET     "http://sdpprdap016.fr.world.socgen:5601/xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/_stats/store,docs"
{"statusCode":404,"error":"Not Found","message":"No Found"}

So I do know why the request does not work on any nodes of my production cluster whereas it works with my dev cluster. Configuration is quiete simple and similar between DEV and PROD.

cluster.name: SGME-PROD
node.name: SDPPRDAP014
bootstrap.memory_lock: true
network.host: #IP_SERVER#
discovery.zen.ping.unicast.hosts: ["sdpprdap008.fr.world.socgen","sdpprdap009.fr.world.socgen","sdpprdap010.fr.world.socgen","sdpprdap011.fr.world.socgen","sdpprdap012.fr.world.socgen","sdpprdap013.fr.world.socgen","sdpprdap014.fr.world.socgen","sdpprdap015.fr.world.socgen","sdpprdap016.fr.world.socgen","sdpprdap017.fr.world.socgen","sdpprdap018.fr.world.socgen","sdpprdap019.fr.world.socgen","sdpprdap024.fr.world.socgen","sdpprdap025.fr.world.socgen","sdpprdap026.fr.world.socgen"]
discovery.zen.minimum_master_nodes: 2
gateway.recover_after_data_nodes: 3
gateway.recover_after_time: 3m
gateway.expected_data_nodes: 9
node.master: false
node.data: false
node.ingest: true

It looks like for some reason Elasticsearch protects itself from any too long requests coming from hostname different than some localhost.

Querying port 5601 only demonstrates that it's not the operating system/kernel on that port. You could still have something that is happening at the system/kernel/Java level (perhaps AppArmor or selinux, a network packet filter or the like).

The default settings are as follows:

http.max_initial_line_length: The max length of an HTTP URL. Defaults to 4kb
http.max_header_size: The max size of allowed headers. Defaults to 8kB

If you haven't altered these in elasticsearch.yml, and you're hitting a 522 character limit, it's not Elasticsearch which is causing the issue.

Thanks. I tried to change the port of one of my nodes from 9200 to 9201 and it worked!
So definitely I need to find which process is playing with the incoming request on 9200 :wink:

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.