Unable to initialize Fleet - An internal server error occurred

Hi,

I'm a bit at a los here, today i upgrade my 7.10.0 dev-cluster to 7.11.0 and when i go to fleet in kibana i get en nice red box "Unable to initialize Fleet - An internal server error occurred".

Lets rewind a bit, i started playing with ingest manager on 7.9.2 and fiddled arround a bit and let it rest due to time constraints. When 7.10 came out i upgraded the cluster and remember it working under the new name fleet, fast forward a month or so and i noticed the red box all of the sudden and again let is be for future me to solve. Enter 7.11 thinking maybe it would solve the issue but nevertheless still got the same init failure.

So now i deciced to investigate;
Kibana gives me ample logs from just the request done by me in the browser to a a message about a transform to a timeout and a clear error, nothing in between..

{"type":"response","@timestamp":"2021-02-15T22:23:47+01:00","tags":["access:fleet-read"],"pid":6159,"method":"get","statusCode":200,"req":{"url":"/api/fleet/agents/setup","method":"get","headers":{"host":"kib01.tld.local:5601","connection":"keep-alive","kbn-version":"7.11.0","user-agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.150 Safari/537.36","content-type":"application/json","accept":"*/*","sec-fetch-site":"same-origin","sec-fetch-mode":"cors","sec-fetch-dest":"empty","referer":"https://kib01.tld.local:5601/app/fleet","accept-encoding":"gzip, deflate, br","accept-language":"nl-NL,nl;q=0.9,en-US;q=0.8,en;q=0.7"},"remoteAddress":"10.40.31.75","userAgent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.150 Safari/537.36","referer":"https://kib01.tld.local:5601/app/fleet"},"res":{"statusCode":200,"responseTime":53,"contentLength":9},"message":"GET /api/fleet/agents/setup 200 53ms - 9.0B"}
{"type":"log","@timestamp":"2021-02-15T22:23:47+01:00","tags":["debug","plugins","security","basic","basic"],"pid":6159,"message":"Trying to authenticate user request to /api/fleet/setup."}
{"type":"log","@timestamp":"2021-02-15T22:23:47+01:00","tags":["debug","plugins","security","basic","basic"],"pid":6159,"message":"Trying to authenticate via state."}
{"type":"log","@timestamp":"2021-02-15T22:23:47+01:00","tags":["debug","plugins","security","basic","basic"],"pid":6159,"message":"Request has been authenticated via state."}
{"type":"log","@timestamp":"2021-02-15T22:23:47+01:00","tags":["debug","plugins","security","api-authorization"],"pid":6159,"message":"User authorized for \"/api/fleet/setup\""}

The transform log message

{"type":"log","@timestamp":"2021-02-15T22:23:57+01:00","tags":["info","plugins","fleet"],"pid":6159,"message":"Found previous transform references:\n [{\"id\":\"endpoint.metadata_current-default-0.16.1\",\"type\":\"transform\"}]"}
{"type":"log","@timestamp":"2021-02-15T22:23:57+01:00","tags":["info","plugins","fleet"],"pid":6159,"message":"Deleting currently installed transform ids endpoint.metadata_current-default-0.16.1"}

And finally the time-out and error

{"type":"log","@timestamp":"2021-02-15T22:24:27+01:00","tags":["error","plugins","fleet"],"pid":6159,"message":"Request Timeout after 30000ms"}
{"type":"log","@timestamp":"2021-02-15T22:24:27+01:00","tags":["error","http"],"pid":6159,"message":"Error: options.statusCode is expected to be set. given options: undefined\n    at Object.customError (/usr/share/kibana/src/core/server/http/router/response.js:136:13)\n    at defaultIngestErrorHandler (/usr/share/kibana/x-pack/plugins/fleet/server/errors/handlers.js:117:19)\n    at FleetSetupHandler (/usr/share/kibana/x-pack/plugins/fleet/server/routes/setup/handlers.js:111:50)\n    at processTicksAndRejections (internal/process/task_queues.js:93:5)\n    at Router.handle (/usr/share/kibana/src/core/server/http/router/router.js:163:30)\n    at handler (/usr/share/kibana/src/core/server/http/router/router.js:124:50)\n    at module.exports.internals.Manager.execute (/usr/share/kibana/node_modules/@hapi/hapi/lib/toolkit.js:45:28)\n    at Object.internals.handler (/usr/share/kibana/node_modules/@hapi/hapi/lib/handler.js:46:20)\n    at exports.execute (/usr/share/kibana/node_modules/@hapi/hapi/lib/handler.js:31:20)\n    at Request._lifecycle (/usr/share/kibana/node_modules/@hapi/hapi/lib/request.js:312:32)\n    at Request._execute (/usr/share/kibana/node_modules/@hapi/hapi/lib/request.js:221:9)"}
{"type":"error","@timestamp":"2021-02-15T22:23:47+01:00","tags":[],"pid":6159,"level":"error","error":{"message":"Internal Server Error","name":"Error","stack":"Error: Internal Server Error\n    at HapiResponseAdapter.toInternalError (/usr/share/kibana/src/core/server/http/router/response_adapter.js:58:19)\n    at Router.handle (/usr/share/kibana/src/core/server/http/router/router.js:177:34)\n    at processTicksAndRejections (internal/process/task_queues.js:93:5)\n    at handler (/usr/share/kibana/src/core/server/http/router/router.js:124:50)\n    at module.exports.internals.Manager.execute (/usr/share/kibana/node_modules/@hapi/hapi/lib/toolkit.js:45:28)\n    at Object.internals.handler (/usr/share/kibana/node_modules/@hapi/hapi/lib/handler.js:46:20)\n    at exports.execute (/usr/share/kibana/node_modules/@hapi/hapi/lib/handler.js:31:20)\n    at Request._lifecycle (/usr/share/kibana/node_modules/@hapi/hapi/lib/request.js:312:32)\n    at Request._execute (/usr/share/kibana/node_modules/@hapi/hapi/lib/request.js:221:9)"},"url":"https://kib01.tld.local:5601/api/fleet/setup","message":"Internal Server Error"}
  • epr.elastic.co i can reach from the kibana server and get reponse on the API. Also logs don't show that its timing out on package registry.

At first i tought it was the same as this issue.
But it didnt match 100% with its server logs and other evidence, still i cannot get rid of that transform job, or it dissapeared when 7.11 was installed. Also cant stop or start it because elastic says it has started.


Its an old job what i'm guessing (as you can see from last year) was the same issue in the link when on 7.10 but now got stuck in some twilight zone on 7.11.

  • Is my assumption correct linking fleet not starting on this?
  • Any ideas on how can i correct this?
  • Or can it be a different reason why fleet is not starting anymore?

Just throwing a small bit of spagetti, thanks in advance!

Apologies for the inconvenience.

The /api/fleet/setup endpoint will always try to update the endpoint (and system) package to the latest version, which is where this error happens.

Can you try to update the endpoint package manually with

curl -X POST -u $USER:$PASS $KIBANA_URL/api/fleet/epm/packages/endpoint-0.17.1 -H 'kbn-xsrf: xyz'

(Please note that this request needs to go to the Kibana server, so you can't use kibana dev tools. It really needs to be with curl or an equivalent tool.

To get the correct package version in cases like this, you can look at https://epr.elastic.co/search?experimental=true&kibana.version=7.11.0 .)

My hope would be to get at least a better error or log message.

Hi Sonja,

I've seen your sollution before, but the problem is that fleet wont start or isnt started (don't know if it works this way) and get the same time out as Kibana does since it uses the same api to communicate with fleet i guess.

{"type":"response","@timestamp":"2021-02-16T14:35:34+01:00","tags":[],"pid":1432,"method":"get","statusCode":200,"req":{"url":"/api/transform/transforms/_stats","method":"get","headers":{"host":"kib01.tld.local:5601","connection":"keep-alive","kbn-version":"7.11.0","user-agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.150 Safari/537.36","content-type":"application/json","accept":"*/*","sec-fetch-site":"same-origin","sec-fetch-mode":"cors","sec-fetch-dest":"empty","referer":"https://kib01.tld.local:5601/app/management/data/transform","accept-encoding":"gzip, deflate, br","accept-language":"nl-NL,nl;q=0.9,en-US;q=0.8,en;q=0.7"},"remoteAddress":"10.40.31.75","userAgent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.150 Safari/537.36","referer":"https://kib01.tld.local:5601/app/management/data/transform"},"res":{"statusCode":200,"responseTime":31,"contentLength":9},"message":"GET /api/transform/transforms/_stats 200 31ms - 9.0B"}
{"type":"log","@timestamp":"2021-02-16T14:35:38+01:00","tags":["error","plugins","fleet"],"pid":1432,"message":"Request Timeout after 30000ms"}
{"type":"log","@timestamp":"2021-02-16T14:35:38+01:00","tags":["error","http"],"pid":1432,"message":"Error: options.statusCode is expected to be set. given options: undefined\n    at Object.customError (/usr/share/kibana/src/core/server/http/router/response.js:136:13)\n    at defaultIngestErrorHandler (/usr/share/kibana/x-pack/plugins/fleet/server/errors/handlers.js:117:19)\n    at installPackageFromRegistryHandler (/usr/share/kibana/x-pack/plugins/fleet/server/routes/epm/handlers.js:235:71)\n    at processTicksAndRejections (internal/process/task_queues.js:93:5)\n    at Router.handle (/usr/share/kibana/src/core/server/http/router/router.js:163:30)\n    at handler (/usr/share/kibana/src/core/server/http/router/router.js:124:50)\n    at module.exports.internals.Manager.execute (/usr/share/kibana/node_modules/@hapi/hapi/lib/toolkit.js:45:28)\n    at Object.internals.handler (/usr/share/kibana/node_modules/@hapi/hapi/lib/handler.js:46:20)\n    at exports.execute (/usr/share/kibana/node_modules/@hapi/hapi/lib/handler.js:31:20)\n    at Request._lifecycle (/usr/share/kibana/node_modules/@hapi/hapi/lib/request.js:312:32)\n    at Request._execute (/usr/share/kibana/node_modules/@hapi/hapi/lib/request.js:221:9)"}
{"type":"error","@timestamp":"2021-02-16T14:35:00+01:00","tags":[],"pid":1432,"level":"error","error":{"message":"Internal Server Error","name":"Error","stack":"Error: Internal Server Error\n    at HapiResponseAdapter.toInternalError (/usr/share/kibana/src/core/server/http/router/response_adapter.js:58:19)\n    at Router.handle (/usr/share/kibana/src/core/server/http/router/router.js:177:34)\n    at processTicksAndRejections (internal/process/task_queues.js:93:5)\n    at handler (/usr/share/kibana/src/core/server/http/router/router.js:124:50)\n    at module.exports.internals.Manager.execute (/usr/share/kibana/node_modules/@hapi/hapi/lib/toolkit.js:45:28)\n    at Object.internals.handler (/usr/share/kibana/node_modules/@hapi/hapi/lib/handler.js:46:20)\n    at exports.execute (/usr/share/kibana/node_modules/@hapi/hapi/lib/handler.js:31:20)\n    at Request._lifecycle (/usr/share/kibana/node_modules/@hapi/hapi/lib/request.js:312:32)\n    at Request._execute (/usr/share/kibana/node_modules/@hapi/hapi/lib/request.js:221:9)"},"url":"https://kib01.tld.local:5601/api/fleet/epm/packages/endpoint-0.17.1","message":"Internal Server Error"}
{"type":"response","@timestamp":"2021-02-16T14:35:00+01:00","tags":["access:fleet-all"],"pid":1432,"method":"post","statusCode":500,"req":{"url":"/api/fleet/epm/packages/endpoint-0.17.1","method":"post","headers":{"user-agent":"curl/7.29.0","host":"kib01.tld.local:5601","accept":"*/*","kbn-xsrf":"xyz"},"remoteAddress":"10.40.251.144","userAgent":"curl/7.29.0"},"res":{"statusCode":500,"responseTime":38258,"contentLength":9},"message":"POST /api/fleet/epm/packages/endpoint-0.17.1 500 38258ms - 9.0B"}

It may be helpful to clear out the transform state. Using force=true will help clear out inconsistent states and try anyway.

Try the following to stop and delete the transform,

curl -X POST <elasticsearch>/_transform/endpoint.metadata_current-default-0.16.1/_stop?force=true
curl -X DELETE <elasticsearch>/_transform/endpoint.metadata_current-default-0.16.1?force=true

When i do the stop, it acknowlegde it(in dev tools & CLI on a elastic node)
{ "acknowledged" : true }
but the delete it times out(dev tools)
{"statusCode":502,"error":"Bad Gateway","message":"Client request timeout"}
and error/time-out(CLI on elastic node)
{"error":{"root_cause":[{"type":"status_exception","reason":"Could not stop the transforms [endpoint.metadata_current-default-0.16.1] as they timed out [30s]."}],"type":"status_exception","reason":"Could not stop the transforms [endpoint.metadata_current-default-0.16.1] as they timed out [30s]."},"status":408}

@pzl not sure what to make of this and where this request stalls..

Thanks for giving that a try. I have created a kibana issue to track what's going on here, and to find a solution

There may be a lead here.

Can you check your node roles? You can do that with GET /_nodes and looking at the roles array for each node.

To run fleet / security you must have at least one node with the "transform" role.

1 Like

This makes a ton of sense, will test and get back to you on it!

@pzl You where right, makes sense too because i introduced a master after we upgraded to 7.10, hardcoding roles and stuff.

Such a small oversight.. :blush: sometimes the logs stare you right in the face without realy seeing it.
Thank you very much, Sonja, Dan!

This fixed the issue for me. Thank you!