Hello,
Environment information
Kubernetes RKE2 Cluster v1.27.3 (DISA STIG Hardened)
Ironbank ECK-operator image 2.9.0
I managed to get the agents running and report a "Healthy" status however wanted to post here to make sure I wasn't missing something and to potentially bring awareness to the issue if it truly is an issue.
I followed the following guide and made very minor changes: Quickstart | Elastic Cloud on Kubernetes [2.9] | Elastic
To the agent manifest files i added the following, as default is hostPath and even with root it doesn't work. This isn't really an issue as it was already identified in another issue.
volumes:
- name: agent-data
emptyDir:
sizeLimit: 500Mi
Once the fleet agents are deployed, they continue to deploy in a CrashLoopBackOff / Error states.
The first issue results in the following error
/usr/bin/tini permission denied. no such file or directory.
After examing the dockerfile on Ironbank it appears it should be "/tinit" i edited the agent deployment as follows
/tinit -- /usr/local/bin/docker-entrypoint -e
The first issue is now resolved.
The following error is logged on the fleet-agent after this:
[{"log.level":"error","@timestamp":"2023-09-18T11:29:22.401Z","log.origin":{"file.name":"coordinator/coordinator.go","file.line":991},"message":"Spawned](mailto:%7B%22log.level%22:%22error%22,%22@timestamp%22:%222023-09-18T11:29:22.401Z%22,%22log.origin%22:%7B%22file.name%22:%22coordinator/coordinator.go%22,%22file.line%22:991%7D,%22message%22:%22Spawned) new unit fleet-server-default-fleet-server: Failed: execution of component prevented: cannot be writeable by group or other","log":{"source":"elastic-agent"},"component":{"id":"fleet-server-default","state":"FAILED"},"unit":{"id":"fleet-server-default-fleet-server","type":"input","state":"FAILED"},"ecs.version":"1.6.0"}
In order to fix this issue, as well as other similar issues, I change permissions on the various components on the fleet-server agent pod and all the elastic agent pods.
I add the following at the beginning of the script in the deployment commands:
chmod 755 /opt/elastic-agent/data/elastic-agent-dc443b/components/metricbeat
chmod 755 /opt/elastic-agent/data/elastic-agent-dc443b/components/fleet-server
chmod 755 /opt/elastic-agent/data/elastic-agent-dc443b/components/auditbeat
chmod 755 /opt/elastic-agent/data/elastic-agent-dc443b/components/cloudbeat
chmod 755 /opt/elastic-agent/data/elastic-agent-dc443b/components/endpoint-security
chmod 755 /opt/elastic-agent/data/elastic-agent-dc443b/components/heartbeat
chmod 755 /opt/elastic-agent/data/elastic-agent-dc443b/components/packetbeat
chmod 755 /opt/elastic-agent/data/elastic-agent-dc443b/components/osquerybeat
chmod 755 /opt/elastic-agent/data/elastic-agent-dc443b/components/filebeat
After these changes, the agents appear to be happy/healthy state.
Not sure if I did anything wrong here, first time deploying elastic, but took me an additional day or two to work out these kinks for something that should've been a lot faster, not sure if it's just Ironbank specific?