I've got a test cluster with a separate monitoring cluster and a production cluster with a separate monitoring cluster. "Review deprecated settings and resolve issues" in the Kibana Upgrade Assistant works on all four clusters except in the production cluster where the message "Could not retrieve Elasticsearch deprecation issues." consistently appears.
I get zero results for that phrase on a Google search, which is disconcerting.
Each attempt to view the deprecated settings results in a pair of error messages in Kibana log:
{"type":"log","@timestamp":"2022-09-27T14:35:15+01:00","tags":["error","http"],"pid":1656,"message":"ConnectionError: connect EMFILE 10.70.12.10:9200 - Local (undefined:undefined)\n at ClientRequest.onError (/usr/share/kibana/node_modules/@elastic/elasticsearch/lib/Connection.js:123:16)\n at ClientRequest.emit (node:events:390:28)\n at TLSSocket.socketErrorListener (node:_http_client:447:9)\n at TLSSocket.emit (node:events:390:28)\n at emitErrorNT (node:internal/streams/destroy:157:8)\n at emitErrorCloseNT (node:internal/streams/destroy:122:3)\n at processTicksAndRejections (node:internal/process/task_queues:83:21) {\n meta: {\n body: null,\n statusCode: null,\n headers: null,\n meta: {\n context: null,\n request: [Object],\n name: 'elasticsearch-js',\n connection: [Object],\n attempts: 0,\n aborted: false\n }\n },\n isBoom: true,\n isServer: true,\n data: null,\n output: {\n statusCode: 503,\n payload: {\n statusCode: 503,\n error: 'Service Unavailable',\n message: 'connect EMFILE 10.70.12.10:9200 - Local (undefined:undefined)'\n },\n headers: {}\n },\n [Symbol(SavedObjectsClientErrorCode)]: 'SavedObjectsClient/esUnavailable'\n}"}
{"type":"error","@timestamp":"2022-09-27T14:35:03+01:00","tags":[],"pid":1656,"level":"error","error":{"message":"Internal Server Error","name":"Error","stack":"Error: Internal Server Error\n at HapiResponseAdapter.toInternalError (/usr/share/kibana/src/core/server/http/router/response_adapter.js:61:19)\n at Router.handle (/usr/share/kibana/src/core/server/http/router/router.js:172:34)\n at runMicrotasks (<anonymous>)\n at processTicksAndRejections (node:internal/process/task_queues:96:5)\n at handler (/usr/share/kibana/src/core/server/http/router/router.js:124:50)\n at exports.Manager.execute (/usr/share/kibana/node_modules/@hapi/hapi/lib/toolkit.js:60:28)\n at Object.internals.handler (/usr/share/kibana/node_modules/@hapi/hapi/lib/handler.js:46:20)\n at exports.execute (/usr/share/kibana/node_modules/@hapi/hapi/lib/handler.js:31:20)\n at Request._lifecycle (/usr/share/kibana/node_modules/@hapi/hapi/lib/request.js:371:32)\n at Request._execute (/usr/share/kibana/node_modules/@hapi/hapi/lib/request.js:281:9)"},"url":"https://foo.domain:5601/api/upgrade_assistant/es_deprecations","message":"Internal Server Error"}
The test and production cluster Kibana instances are accessed via a load balancer and browser dev tools show that the request for deprecated Elasticsearch settings is done via an URL that doesn't contain the Kibana server hostname. E.g. the Kibana server hostname is foo.domain but the Kibana instance is accessed via https://logs.domain/my_kibana/api/upgrade_assistant/es_deprecations
or for the test cluster https://logs-test.domain/my_kibana/api/upgrade_assistant/es_deprecations
Viewing the logs-test
URL in a web browser consistently returns JSON formatted details of deprecation issues. Attempting to view the production URL consistently results in a delay of somewhere between 10 and 20 seconds and then a 500 Internal Server Error response. Attepmting to access the URL https://foo.domain:5601/api/upgrade_assistant/es_deprecations
that's in the log above using curl
on foo.domain
results in the same behaviour.
I can't find any differences in the set up between test and production but the production cluster does contain a lot more data than any of the others. Could it be that something involved in the retrieval of deprecation issues cannot deal with there being 160 billion documents in 2400 open indices, plus another 2500 closed indices, most of which will have been created with Elasticsearch 6 and so, I believe, should appear in deprecation issues?
I can't find any relevant errors in the Elasticsearch log on the Kibana server or on the master node. Setting Kibana logging to
logging.root.level: debug
doesn't produce any more errors messages than the two above.
Can anyone give me some pointers on how to get some information about why that 500 Internal Server Error occurs, or offer an idea of why it's happening and/or what to do about it?