Cloud UI logs url point to undefined app

Hi ECE Team,

I'm runing some tests on ECE.

I create some cluster but when I try to see logs in the Cluster / <my_cluster> tab / Overview, links are pointing to some /undefined/app/kibana

Example for the elasticsearch logs : http://slece001.maif.local:12400/undefined/app/kibana#/discover?_a=(columns:!(message),index:'cluster-logs-*',query:(query_string:(query:'ece.cluster:"eb1f0e8c82dc4425b32250fb5b3d0ec5"')))&_g=(time:(from:now-1h,mode:quick,to:now))

What I missed in my installation process that could explain this ?

Regards,
Johnny

Hi Johnny,

Thanks so much for testing out ECE and for reporting this issue! What seems to be happening here is that the installation is missing the Logging and Metrics cluster, but I'm not sure how that could have happened. Can you verify that you see this cluster in the cluster list?

Also could you please get us the response from the API call that the UI is seeing for any cluster that is experiencing this issue. If you open up the dev tools in a browser, and navigate to the network tab, you'll see the api/v0.1/regions/ece-region/clusters/eb1f0e8c82dc4425b32250fb5b3d0ec5 API call. Can you copy/paste the response into here?

Thanks again,
Andrew

Hi @JohnnyB

I had a quick look at all things that could cause hrefs.logging and hrefs.metrics to be missing from the API call @Andrew_Moldovan mentioned, which is the cause of that faulty endpoint

The most likely is that the install failed before the location of the logging cluster could be written. Could you check your /mnt/data/elastic/logs/bootstrap-logs/bootstrap.log, at the end of the logs I have a set of commands like:

[2017-06-20 18:19:51,848][INFO ][no.found.bootstrap.BootstrapInitial] Enabling Kibana for Logging and Metrics Cluster {}
[2017-06-20 18:19:52,400][INFO ][no.found.bootstrap.ServiceLayerBootstrap] waiting for cluster [KibanaCluster(6434b73e48ae464997a3ad4e8e65eec7)] {}
[2017-06-20 18:19:52,407][INFO ][no.found.bootstrap.ServiceLayerBootstrap] Waiting for [waiting-for-cluster] to complete. Retrying every [1 second] (c
ause: [java.lang.Exception: not yet started]) {}
[2017-06-20 18:24:24,664][INFO ][no.found.bootstrap.BootstrapInitial] Shutting down bootstrapper {}

Do you have any timeout errors there at all?

(We have been seeing some issues where the docker Kibana download and the subsequent cluster provisioning can take more than the 10 minute timeout we allocate - we are upping that timeout and also updating the docs to explain how to address it for 1.0.2)

Alex

Hi @Andrew_Moldovan

The logging and Metrics Cluster exist since the installation of ECE.

The request asked is successfull but the response is too big to share it in the post. How to upload har file in this topic ?

I noticed many get data to http://slece001.maif.local:12400/api/v0.1/regions/ece-region/phone-home/data which all results in 404 error code. What's this ?

Hi @Alex_Piggott,

I can see the exception you mention :

[2017-06-09 13:46:05,968][INFO ][no.found.bootstrap.ServiceLayerBootstrap] Waiting for [ensuring-plan] to complete. Retrying every [1 second] (cause: [java.lang.Exception: not yet started]) {}
[2017-06-09 13:46:34,375][INFO ][no.found.bootstrap.BootstrapInitial] Enabling Kibana for Logging and Metrics Cluster {}
[2017-06-09 13:46:34,673][INFO ][no.found.bootstrap.ServiceLayerBootstrap] waiting for cluster [KibanaCluster(bf7f8aeda102430381bb515c06e5d9c6)] {}
[2017-06-09 13:46:34,696][INFO ][no.found.bootstrap.ServiceLayerBootstrap] Waiting for [waiting-for-cluster] to complete. Retrying every [1 second] (cause: [java.lang.Exception: not yet started]) {}
[2017-06-09 13:56:34,678][ERROR][no.found.bootstrap.BootstrapInitial$] Unhandled error. {}
java.util.concurrent.TimeoutException: Futures timed out after [600 seconds]
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:190)
at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
at scala.concurrent.Await$.result(package.scala:190)
at no.found.bootstrap.BootstrapInitial.bootstrapLoggingMetricsCluster(BootstrapInitial.scala:958)
at no.found.bootstrap.BootstrapInitial.bootstrap(BootstrapInitial.scala:611)
at no.found.bootstrap.BootstrapInitial$.delayedEndpoint$no$found$bootstrap$BootstrapInitial$1(BootstrapInitial.scala:1153)
at no.found.bootstrap.BootstrapInitial$delayedInit$body.apply(BootstrapInitial.scala:1147)
at scala.Function0$class.apply$mcV$sp(Function0.scala:34)
at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
at scala.App$$anonfun$main$1.apply(App.scala:76)
at scala.App$$anonfun$main$1.apply(App.scala:76)
at scala.collection.immutable.List.foreach(List.scala:381)
at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:35)
at scala.App$class.main(App.scala:76)
at no.found.bootstrap.BootstrapInitial$.main(BootstrapInitial.scala:1147)
at no.found.bootstrap.BootstrapInitial.main(BootstrapInitial.scala)

And about the kibana, I also notice that the provisionning or moving kibana's nodes from an allocator to another is very very very long (well, in fact, I cancel and re-apply the modification to go on). Should I open a new topic to explain what I observed ?

Johnny

@JohnnyB

Thanks for posting the logs!

They do confirm that the install timed out right at the end (because of the combination of time taken to download the Kibana docker image, and time to provision).

As I mentioned we are addressing that timeout problem in July's release. In the meantime getting a working version unfortunately requires blowing away and re-installing:

It might be worth re-trying the Kibana re-allocation with the new install, though I am not aware of any steps at the end of the installatiion that would affect it, so it will likely persist - please do open up a new thread and we'll look into that

Alex

@JohnnyB Don't worry about posting the har file now, we now know the issue as @Alex_Piggott described.

The 404 for phone-home/data that you are seeing is because you opted-out of sending us metrics information about your installation. We had a bug in the UI that would continuously try and fetch that data even though you had opted out (edit: though it would not do so because of the opt out, hence the error). The bug is fixed and you shouldn't see that anymore in the July release. Sorry about that.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.