Hi, could use some information on how auditbeat works so we can try and figure out where the limitation is that is dragging our system down.
We are just using auditbeat to monitor 1 directory and 1 file (auditbeat.yml). The directory as around 1.83 million files and is 6 terabytes in size and is 96% used (Yes we keep purging projects!) and is on a separate mount from the rest of the system. This is sent to Logstash on a remote server to then process.
It is a virtual system with 96Gbytes RAM, 16Gbytes Swap, dozens of CPUs and fast discs.
It is not noticeably impacting user performance that we can see, but it is impacting the two types of backup that we are running.
We have a disaster recovery (DR) system that we are running and every evening rsync is run and copies all the changes from the live server to the DR one. That normally takes around 5-6 hours, but with auditbeat running, this easily doubles the run time.
We are also using TSM to do a more 'normal' backup and this one has gone from 13-14 hours to getting to the point it never completes. If it takes longer than 24 hours to run, it does not start the next back up, it just completes the current one and it got to the point that the last run with auditbeat on was 72 hours. Basically taking longer and longer. Not good.
What I'm after is some guidance on how auditbeat works internally so that we can focus on seeing what we can do to alleviate this bottleneck.
To me this looks to be a disk issue, but I had thought that auditbeat was lightweight and ran in memory.
Any guidance gratefully received.
(PS. Not a Linux performance expert, so recognise I might need some guidance as to what to check!)