Ironbank Elastic Agent 8.9.0 Issues - tinit, group writeable components

Hello,

Environment information

Kubernetes RKE2 Cluster v1.27.3 (DISA STIG Hardened)
Ironbank ECK-operator image 2.9.0

I managed to get the agents running and report a "Healthy" status however wanted to post here to make sure I wasn't missing something and to potentially bring awareness to the issue if it truly is an issue.

I followed the following guide and made very minor changes: Quickstart | Elastic Cloud on Kubernetes [2.9] | Elastic

To the agent manifest files i added the following, as default is hostPath and even with root it doesn't work. This isn't really an issue as it was already identified in another issue.

        volumes:
        - name: agent-data
          emptyDir:
            sizeLimit: 500Mi

Once the fleet agents are deployed, they continue to deploy in a CrashLoopBackOff / Error states.

The first issue results in the following error

/usr/bin/tini permission denied. no such file or directory.

After examing the dockerfile on Ironbank it appears it should be "/tinit" i edited the agent deployment as follows

/tinit -- /usr/local/bin/docker-entrypoint -e

The first issue is now resolved.

The following error is logged on the fleet-agent after this:

[{"log.level":"error","@timestamp":"2023-09-18T11:29:22.401Z","log.origin":{"file.name":"coordinator/coordinator.go","file.line":991},"message":"Spawned](mailto:%7B%22log.level%22:%22error%22,%22@timestamp%22:%222023-09-18T11:29:22.401Z%22,%22log.origin%22:%7B%22file.name%22:%22coordinator/coordinator.go%22,%22file.line%22:991%7D,%22message%22:%22Spawned) new unit fleet-server-default-fleet-server: Failed: execution of component prevented: cannot be writeable by group or other","log":{"source":"elastic-agent"},"component":{"id":"fleet-server-default","state":"FAILED"},"unit":{"id":"fleet-server-default-fleet-server","type":"input","state":"FAILED"},"ecs.version":"1.6.0"}

In order to fix this issue, as well as other similar issues, I change permissions on the various components on the fleet-server agent pod and all the elastic agent pods.

I add the following at the beginning of the script in the deployment commands:

chmod 755 /opt/elastic-agent/data/elastic-agent-dc443b/components/metricbeat
chmod 755 /opt/elastic-agent/data/elastic-agent-dc443b/components/fleet-server
chmod 755 /opt/elastic-agent/data/elastic-agent-dc443b/components/auditbeat
chmod 755 /opt/elastic-agent/data/elastic-agent-dc443b/components/cloudbeat
chmod 755 /opt/elastic-agent/data/elastic-agent-dc443b/components/endpoint-security
chmod 755 /opt/elastic-agent/data/elastic-agent-dc443b/components/heartbeat
chmod 755 /opt/elastic-agent/data/elastic-agent-dc443b/components/packetbeat
chmod 755 /opt/elastic-agent/data/elastic-agent-dc443b/components/osquerybeat
chmod 755 /opt/elastic-agent/data/elastic-agent-dc443b/components/filebeat

After these changes, the agents appear to be happy/healthy state.

Not sure if I did anything wrong here, first time deploying elastic, but took me an additional day or two to work out these kinks for something that should've been a lot faster, not sure if it's just Ironbank specific?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.