This morning we had two different groups setting up a new ECE lab on prem and in Azure. Both are hitting the same error with the install script. We are both installing on RHEL 7.7 that has been prepared according to the documentation. Both groups had previous successful installs a few weeks ago in a smaller POC. It looks like its a python error:
Fatal Python error: Py_Initialize: Unable to get the locale encoding
ImportError: No module named 'encodings'
I noticed something about python 3 later on in the error and I'm pretty sure RHEL 7.7 comes with python 2.7. Something must have changed in the install script since we last ran it, and now it doesn't work on systems prepared according to documentation. Any thoughts would be appreciated. Full error below.
I believe I've seen this twice, once was because selinux was enabled, and the other was a permissions issue on /mnt/data/elastic / /mnt/data/docker (or whatever root directories were being used for ECE/docker)
Oddly enough it just started working perfectly again. We have a hunch that there was a commit to the github that the script pulls from that was bad and they backed it out.
And its magically not working again in both Azure and on prem. It downloads the container images and then throws that error. If I re-run the script it hits the error immediately. If I remove the images with docker rmi and then re-run it will download again and then fail. We might move to an offline install to eliminate variables as I have no idea why it worked, didn't work, worked, and doesn't work agian.
Are you sure you don't have some script/part of your host install infra that is messing with the permissions?
This has been seen twice before that I am aware of, and both times it was permissions related (once chmod once selinux) ... the fact that it happens intermittently would make me wonder if it's a race condition with some other script that runs?
Alex - do we need to remove the --selinux from the docker config file? To follow on what Andrew was talking about - We have created an offline environment and following the directions from the website - we have done all the pre-reqs on all the systems and created the tar files and moved them to all the machines. We are hoping that the directions work this morning.
I agree we don't document it very well currently, partly because rather than trying and we are working towards documenting the policy settings - it's not that it doesn't work, it's that we do not currently know which policy settings break it
Yep I found that little note about SELinux after backing out of the RHEL specific stuff. The trouble is we will have to lock down these hosts to CIS benchmark standards which require SELinux to be set to enforcing. If we have to leave a benchmark disabled, its a whole thing. Some hosts install just fine with Enforcing, but most fail. We've probably got a good amount of work to figure out what is needed to make ECE and SELinux play well on RHEL. For now we are working in an all be it insecure way according to CIS.
Digging some more into this - we do apparently internally have a working Centos 7.6 / ECE 2.3.3 setup with selinux ... I'm working to understand how far away this is from being something we can publish and maintain. It might be worth your opening this request as a support ticket.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.