ECE startup after a server restart


#1

If servers are restarted after a temporary shutdown, how do I check if has ECE started up again?

I can't access the Cloud UI, so I assume are there any command line checks that's i can do?


(Patroklos Papapetrou) #2

Hi @sayeedch
For a start you can check some things by running

docker ps
or
systemctl status docker
or
journalctl -u docker
to see if something is wrong

We don't have (yet) any script(s) to check for running status and other similar issues.


(Yuri Tceretian) #3

@sayeedch to add to what @Patroklos_Papapetrou said, you can also check the status of the runner of that host via UI (see tab "Runners"). If it is running (status is green) then very likely that all other services on that host are running too. If they don't, the runner will start them. Also, if the node has allocator service, then you can check its status as well (see tab "Allocators").


#4

Thanks guys.

I ran the above commands and it appeared that the docker daemon failed to start. It wasn't configured to restart after a reboot. I fixed the daemon issue using the information on here: https://stackoverflow.com/questions/39100641/docker-service-start-failed

However, by running the remove command, I ended up deleting all the ECE images on the coordinator box. I am running a coordinator and an allocator on separate boxes each.

I tried pulling the images by running the ECE installation script again on the coordinator but that resulted in failure.

  • Running Bootstrap container
  • Monitoring bootstrap process
  • Loaded bootstrap settings {}
  • Unhandled error. {}
    -- An error has occurred in bootstrap process. Please examine logs --
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~
    Errors have caused Elastic Cloud Enterprise installation to fail - Please chec k logs
    Node type - initial
  
This what I can see in the bootstrap logs: 

9-07 14:53:21,111][INFO ][no.found.bootstrap.BootstrapInitial] Loaded bootstrap settings {}
[2017-09-07 14:53:21,625][WARN ][no.found.docker.DockerContainerManager] Default registry [https://index.docker.io/v1/] has no auths. Known auths: [List()] {}
[2017-09-07 14:53:23,642][INFO ][no.found.docker.DockerContainerManager] Creating container [frc-zookeeper-servers-zookeeper] {"ec_container_kind":"docker","ec_container_group":"zookeeper-servers","ec_container_name":"zookeeper"}
[2017-09-07 14:53:23,642][INFO ][no.found.docker.DockerContainerManager] Creating container [frc-client-forwarders-client-forwarder] {"ec_container_kind":"docker","ec_container_group":"client-forwarders","ec_container_name":"client-forwarder"}
[2017-09-07 14:53:23,673][INFO ][no.found.docker.DockerContainerManager] Starting container [frc-zookeeper-servers-zookeeper] {"ec_container_kind":"docker","ec_container_group":"zookeeper-servers","ec_container_name":"zookeeper"}
[2017-09-07 14:53:23,687][INFO ][no.found.docker.DockerContainerManager] Starting container [frc-client-forwarders-client-forwarder] {"ec_container_kind":"docker","ec_container_group":"client-forwarders","ec_container_name":"client-forwarder"}
[2017-09-07 14:53:24,561][INFO ][org.apache.curator.framework.imps.CuratorFrameworkImpl] Starting {}
[2017-09-07 14:53:24,576][INFO ][no.found.curator.ForwardedEnsembleProvider] Unable to read servers list from [http://172.16.3.5:2180/zookeeper/clients/ensemble/connection-string?namespace=/v1], falling back to [0.0.0.0:2181] {}
[2017-09-07 14:53:24,577][INFO ][no.found.curator.ForwardedEnsembleProvider] Resolved connection string from [http://172.16.3.5:2180/zookeeper/clients/ensemble/connection-string?namespace=/v1] to [0.0.0.0:2181/v1] with local namespace [/v1] {}
[2017-09-07 14:53:24,598][INFO ][no.found.curator.ForwardedEnsembleProvider] Unable to read servers list from [http://172.16.3.5:2180/zookeeper/clients/ensemble/connection-string?namespace=/v1], falling back to [0.0.0.0:2181] {}
[2017-09-07 14:53:24,598][INFO ][no.found.curator.ForwardedEnsembleProvider] Resolved connection string from [http://172.16.3.5:2180/zookeeper/clients/ensemble/connection-string?namespace=/v1] to [0.0.0.0:2181/v1] with local namespace [/v1] {}


Luckily, I'm testing this in development as it's a new product for us. How do I fix the coordinator so that I can  reconnect it with the allocator, which is running fine according to docker.

I don't have access to the Cloud UI as it only runs on the coordinator.

(Yuri Tceretian) #5

@sayeedch
go to the host where you installed ECE first. Open file {HOST_STORAGE_PATH}/bootstrap_state/bootstrap_secrets.json where HOST_STORAGE_PATH is the path that you specified during installation (argument --host-storage-path). If you did not specify it, the path would be /mnt/data/elastic/boostrap_state/....
This file contains very sensitive information and we highly recommend to move that file to a secure location. The file contains a JSON object. You need to find a field emergency_all_roles_except_allocator_token and copy its value. This is a token that allows you to bootstrap an ECE instance with all roles except allocator.

Then follow instructions described in Using the Emergency Roles Token

Hope it helps. Please let me know if it worked for you or not


(system) #6

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.