I'm seeing errors like this one in the logs of some of my fleet managed elastic-agents. After doing some testing, it looks like I'm not getting all of the log data that I should be getting. For example, on my Windows Domain Controllers, I have the elastic-agents deployed with a policy that has the IIS, Microsoft, System, and Windows integrations. On the 8th of this month, the agent captured changes made to a security group and I can see them in the index in Kibana as well as the appropriate dashboard. However, changes that were made on the 13th to the same security group are not showing in the index, nor the dashboard.
Looking at the logs and expanding the range of dates and selecting only the hostname instead of the events, I can see that the data stopped coming in a few hours before the security group change was made.
When I go into Stack Management - Users, there is no fleet-server user in the list for me to add any index management rights on. And, with it working previously, I don't know that I would need to do this.
I'm not finding any other logs that help indicate what may be happening, why the data flow stopped.
Any suggestions on how best to correct this? What would be the best locations to look for logs to track this down?
After digging deep through my Windows event logs, I discovered the following:
McAfee pushed an update down from my ePO server to the Windows server. This update required the restart of a handful of Windows services.
I noted those services and went through restarting them myself in order to see the effects on the data stream.
My testing shows that a restart of the Windows Event Log service caused the items in event.dataset system.security to stop being sent to the ELK stack.
The events start streaming again if the Elastic Agent service is restarted.
I don't imagine that there are very many cases where this kind of event would occur, but a malicious user might shutdown that service in order to try and hide their tracks.
It would be helpful if there was a way to update the status (healthy) for the continued receipt of particular data sets. There may be a way to do this already and I just haven't seen it yet.
Of course, all this means is, I don't know for certain if I am missing out on any data from the agent. I would assume that I am missing out on some metric data from being posted in the index.
I checked the metricbeat logs on the server in question and these errors are still being logged.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.