Hello,
I deployed a WEC with a customer to forward its Windows logs to our SIEM, following the Elastic WEC Server cookbook.
However, after synchronizing with the customer, he generated some events that we didn't receive in the SIEM (user addition to Domain Admins group, security events deletion, etc..).
After debug sessions, we identified the logs were present on the WEC server, but not transmitted to the rest of the chain.
The existing architecture is the following:
Winlogbeat on WEC (7.16.3)-> Logstash (7.17.9)-> Kafka -> Logstash (7.17.9)-> Elasticsearch (7.17.9) -> Kibana (7.17.9)
I have direct access to all machines except the WEF, so i identified there were no major issues on Logstash nodes, Kafka and Elasticsearch (no indexing failures, no Logstash drop, no dropped packets).
I attached the current Winlogbeat configuration, and also the WEF subscriptions defined for each custom channel.
As you could see, there are many Windows sources (Security, Sysmon, ADFS, Defender and so on).
We did a test with the customer, i asked him to install Winlogbeat directly on his computer (which is also sending logs through the WEC), so we could see the difference if we bypass the WEC instance. We indeed get more logs compared to the WEF collection, which in my opinion would suggest two conclusions:
- the WEF is not able to receive/process all the logs from remote Windows machines
- Winlogbeat does not follow the evtx pace
I boosted a bit the Winlogbeat (batch_size on custom channels, forwarded boolean to true for custom channels), but i'm maybe missing an obvious parameter to solve this problem.
I'm fairly sure a second WEC would help to split the load and lower the loss, but i'm looking for "solid" evidences to confirm this intuition.
Does anyone already face this kind of phenomenon? Would the Elastic Agent or a custom Filebeat on the WEC help to get some metrics about this?
Thanks in advance, Ryu