I want to use Fleet, but I have some doubts about the documented deployment method. I think it would've been better if it was possible to just install the elastic-agent as an RPM/DEB and then configure a yaml file, then start the systemd service unit and the rest will take care of it. Of course you can install the elastic-agent package and then use the enroll switch, but then you don't benefit of all Fleet features.
Instead, it's recommended to extract a tar file, run a one-liner, which then sets up a directory in /opt/Elastic and a systemd unit file.
So my first question is, is there a way to just install the RPM/DEB and then configure a yaml file with the enrollment token, IP and certificate of the ELK node? And if not, how do you then manage this in an idempotent way? Do you script it? Or do you use Ansible/Salt/etc. to do this?
My best guess is now to do the following with Salt states (but might as well be Ansible or whatever).
Make sure the RPM/DEB elastic-agent is not installed
Check if /etc/systemd/system/elastic-agent.service and /opt/Elastic exists, if not, then
Download the intended version of the Elastic Agent tar file to /tmp if it doesn't exist
If the above condition happens, then extract it in /tmp as well
Then run the single shot command to install the agent with the --non-interactive switch
With the action of downloading/extracting the tar file registered, cleanup afterwards
Make sure the elastic-agent.service is enabled and started
The upgrade feature is not supported for upgrading DEB/RPM packages or Docker images.
The confusion originates more because of the setup you have to work with. Sure, it will work, by carefully making sure it will be idempotent and the desired state is ensured, something you shouldn't have to because better methods already have existed for quite some time. To me it looks like it was setup as a proof of concept and got adopted as a production solution without looking for proper integration in a Linux environment. That example alone is also very strange, almost as if it was thought of by someone who doesn't use Linux on a professional level.
Repositories have existed for a long time to make sure you install the latest version released, with as a bonus a PGP check before installing.
Installing something directly with dpkg or rpm is also a bit of an ugly thing to do, those are not dependency revolvers, but I get they are used in this step because of the first step.
Installing a service with a command is also strange, basically everything in Linux is a file. Especially configuration files. A yaml file with that information would be easier to deploy over multiple nodes than running that command.
systemctl enable --now elastic-agent is sufficient, no need to run 2 separate commands.
The documentation talks as if the native Linux methods are "3rd party", but I think it's the opposite way. Anyway, I'm not here to criticize, I'm trying to find a proper solution. I find it hard to believe I'm the first and only one who doesn't find this method production suitable. It's as if the designers only thought of deploying it as a test, not actual managing it on a large cluster.
Large HPC (High Performance Computing) cluster. Making sure a config file is in the correct state is easier to do than to check for all the conditions to match correctly and then either re-enroll or install an agent.
I can give it a try, but if I'm the first that brings this up, then I'm very much interested in how other's have approached this.
I had a couple of issues last year when we planned to start using more the Elastic Agent, but it was more related on how the data management and custom pipelines/templates/mappings work.
We planned to replace some of our many Logstash pipelines with Elastic Agent integrations, but since we had a lot of custom transformations, mappings and lifecycle policies we looked on how to do that with Elastic Agent and it would add a lot more of work to manage everything.
It would make it easier to add new data because of the integrations, but the management of this data would be way more complex, I made some suggestion on this github issue it seems that things are improving on this side.
But I agree that the documentation can be confusing sometimes.
We had a plan to replace thousands of Wazuh agents with Elastic Agents, that we stopped because of the issues mentioned before, but the steps we planned were something like this:
Create an agent policy for Windows Hosts
Create an agent policy for Linux hosts
Automate the download and deploy of the tar.gz versin of the Agent using the correspondent enrollment token for each policy
It looks like that this could work, but we never tested because the plan was cancelled.
I think that the documentation can be greatly improved with an example on how to deploy the Elastic Agent using only CLI, without the need to rely on Kibana.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.