Hi there,
I am testing ece beta 2 on ubuntu 16.04. I have the phenomena that on one test machine the first installation does not work, then I remove the docker images, then I restart the installation script and afterwards it works. I would like to ask in a first step / high level if someone else had this behaviour and if it makes sense. To do so I attach some non-debug level information:
It looks like docker is timing out while pulling down or starting some of the images (it's a commonly seen problem, eg: Issues · docker/compose · GitHub), it probably gets overloaded at some point.
The common solution is to try setting (eg) export DOCKER_CLIENT_TIMEOUT=120 .. I'm not 100% sure where the error is occurring and whether having it in your environment when running the script is sufficient, or if it needs to get pushed into the install container itself. But I'd start there.
Good morning Alex,
thank you for your insight. I will test it as recommanded and using version 1.0 just realeased.
I am using the recommanded docker version, meaning Docker version 1.11.2, build b9f10c9.
I will start from the scatch again with debug on and the new docker client timeout.
Let's see how it works out.
cheers and keep you informed,
Stefano
Hi everyone,
I am still having problems with the installation with the newest version of ece this time (1.0.1).
I went this way again on an ubuntu 16.04, after the preparation steps from the documentation
and also added this parameter as I was told to do: export DOCKER_CLIENT_TIMEOUT=120
The results look like this. Can anyone give me a hint how to get an installation that works from the beginning? Some tips from Elastic side perhaps? Thanks!
Elastic Cloud Enterprise Installer
Start setting up a new Elastic Cloud Enterprise installation by installing the software on your first host.
This first host becomes the initial coordinator and provides access to the Cloud UI, where you can manage your installation.
To learn more about the options you can specify, see the documentation.
NOTE: If you want to add this host to an existing installation, please specify the --coordinator-host and --roles-token flags
Applying Admin Console Elasticsearch index templates {}
Unhandled error. {}
-- An error has occurred in bootstrap process. Please examine logs --
Exception in thread Thread-4:
Traceback (most recent call last):
File "/usr/lib/python3.5/threading.py", line 914, in _bootstrap_inner
self.run()
File "/usr/lib/python3.5/threading.py", line 862, in run
self._target(*self._args, **self._kwargs)
File "/usr/lib/python3.5/site-packages/sh.py", line 1484, in output_thread
done = stream.read()
File "/usr/lib/python3.5/site-packages/sh.py", line 1974, in read
self.write_chunk(chunk)
File "/usr/lib/python3.5/site-packages/sh.py", line 1949, in write_chunk
self.should_quit = self.process_chunk(chunk)
File "/usr/lib/python3.5/site-packages/sh.py", line 1867, in process
return handler(chunk)
File "/usr/lib/python3.5/site-packages/sh.py", line 1106, in fn
return handler(chunk, *args)
File "/elastic_cloud_apps/bootstrap-initiator/bootstrap_initiator/logging.py", line 34, in process_info_log_output
puts("- {0}".format(line.split("]",3)[3].strip()))
IndexError: list index out of range
Errors have caused Elastic Cloud Enterprise installation to fail - Please check logs
Node type - initial
Elastic Cloud Enterprise installation completed successfully
...
As always I can give more input or / and make a new test from scratch and / or share more details.
kind regards,
Stefano
Hi @smm - sorry you're still having problems, at least they seem to be different problems now!
There is a known issue with stack trace parsing in our installer, I see from our github feed that we're fixing it at the moment.
In the meantime you should be able to see the actual root cause by heading over to /mnt/data/elastic/logs/bootstrap-logs/bootstrap.log - there should be an obvious stack trace at the end of that
We've seen something similar looking a few times internally (*). It appears that the combination of the docker pull for Elasticsearch and Kibana images and the subsequent cluster provisioning can take more than the (annoyingly hardwired!) 10 minute timeout, particularly on slower machines or if the docker repo is under heavy load. We have an issue to fix this.
(*) (If in the bootstrap.log the exception is a 600 second timeout)
This is only a problem when provisoning the genesis node - annoying though it is, it can be worked around by removing the failed installed as per here and then rerunning the install. The cleansing step doesn't purge the docker images so the provisioning step will go faster
Dear Alex,
thank you for your inputs! I will follow up this topic tomorrow. The last week(s) I had quite much to do in / for my central elastic environment here at Metro.
I keep you updated.
kind regards,
Stefano
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.