Install script failing with python error (new)

This morning we had two different groups setting up a new ECE lab on prem and in Azure. Both are hitting the same error with the install script. We are both installing on RHEL 7.7 that has been prepared according to the documentation. Both groups had previous successful installs a few weeks ago in a smaller POC. It looks like its a python error:

Fatal Python error: Py_Initialize: Unable to get the locale encoding
ImportError: No module named 'encodings'

I noticed something about python 3 later on in the error and I'm pretty sure RHEL 7.7 comes with python 2.7. Something must have changed in the install script since we last ran it, and now it doesn't work on systems prepared according to documentation. Any thoughts would be appreciated. Full error below.

Unable to find image 'docker.elastic.co/cloud-enterprise/elastic-cloud-enterprise:2.4.3' locally
Trying to pull repository docker.elastic.co/cloud-enterprise/elastic-cloud-enterprise ...
2.4.3: Pulling from docker.elastic.co/cloud-enterprise/elastic-cloud-enterprise
100c7683cffa: Pull complete
6a7b32ded68c: Pull complete
Digest: sha256:bd081297a163847271fa9e7953f72a4d7a683a5d3953eefee15576c1b68df109
Status: Downloaded newer image for docker.elastic.co/cloud-enterprise/elastic-cloud-enterprise:2.4.3
Fatal Python error: Py_Initialize: Unable to get the locale encoding
ImportError: No module named 'encodings'

Current thread 0x00007fc2018f7740 (most recent call first):
/usr/bin/elastic-cloud-enterprise-installer: line 4: 8 Aborted (core dumped) python3 /elastic_cloud_apps/bootstrap-initiator/initiator.py

I believe I've seen this twice, once was because selinux was enabled, and the other was a permissions issue on /mnt/data/elastic / /mnt/data/docker (or whatever root directories were being used for ECE/docker)

Oddly enough it just started working perfectly again. We have a hunch that there was a commit to the github that the script pulls from that was bad and they backed it out.

a commit to the github that the script pulls from that was bad and they backed it out.

(Not sure if I understood, but definitely not something that happened on our side. Changing those images requires nuclear-launch-code like process :slight_smile: )

And its magically not working again in both Azure and on prem. It downloads the container images and then throws that error. If I re-run the script it hits the error immediately. If I remove the images with docker rmi and then re-run it will download again and then fail. We might move to an offline install to eliminate variables as I have no idea why it worked, didn't work, worked, and doesn't work agian.

Are you sure you don't have some script/part of your host install infra that is messing with the permissions?

This has been seen twice before that I am aware of, and both times it was permissions related (once chmod once selinux) ... the fact that it happens intermittently would make me wonder if it's a race condition with some other script that runs?

Alex - do we need to remove the --selinux from the docker config file? To follow on what Andrew was talking about - We have created an offline environment and following the directions from the website - we have done all the pre-reqs on all the systems and created the tar files and moved them to all the machines. We are hoping that the directions work this morning.

Let us know your thoughts - Linda

I believe we have only seen ECE work on RH 7 with selinux disabled (at which point --selinux is moot)

Checked both things and doing a setenforce 0 allowed the script to run. I don't think that step was in the environment documentation for RHEL.

I agree we don't document it very well currently, partly because rather than trying and we are working towards documenting the policy settings - it's not that it doesn't work, it's that we do not currently know which policy settings break it

So soon the documentation should improve

Currently the only mention of it is https://www.elastic.co/guide/en/cloud-enterprise/2.4/ece-prereqs-software.html where it sort of implies you can use selinux provide you configure it "correctly" (for some unspecified concept of "correct"!)

Yep I found that little note about SELinux after backing out of the RHEL specific stuff. The trouble is we will have to lock down these hosts to CIS benchmark standards which require SELinux to be set to enforcing. If we have to leave a benchmark disabled, its a whole thing. Some hosts install just fine with Enforcing, but most fail. We've probably got a good amount of work to figure out what is needed to make ECE and SELinux play well on RHEL. For now we are working in an all be it insecure way according to CIS.

Thanks for your insights Alex!

Digging some more into this - we do apparently internally have a working Centos 7.6 / ECE 2.3.3 setup with selinux ... I'm working to understand how far away this is from being something we can publish and maintain. It might be worth your opening this request as a support ticket.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.