I have been struggeling with parsing netflow from my Cisco FTD (cluster mode) with around 12k events per second. Through filebeat -> elasticsearch.
(Using version 7.6.0 for filebeat/elastic)
I have 2 servers, 1 for filebeat the other for elasticsearch. They have the following hardware config:
Filebeat: 8xCPU, 16Gb Memory, 250Gb storage (SSD).
Elastic: 16xCPU, 32Gb Memory, 7Tb storage (SSD).
Networking is 10gb.
(probably a bit oversized but we can change that in the future)
On my filebeat node i can see a constant flow of netflow packets comming from my firewall. (I checked this with tcpdump -nni any port 2055). But when i do a tcpdump on the output side of filebeat to elastic (tcpdump -nny any port 9200) i can see that sometimes filebeat stops sending data to my elastic node all together but resumes after some time. You can see this in this picture:
When using htop during these outages i can see barely any usage on my CPU and memory. So it looks like filebeat is dropping traffic to my elastic node but i cant figure out why. I have been toying around with the filebeat output settings:
bulk_max_size: 4096 worker: 2
I tryed smaller bulk sizes and more or less workers. also i have been testing the queue.mem settings from filebeat with multiple settings:
queue.mem: events: 4096 flush.min_events: 0
On the elastic side i temporarely disabled replication and changed index refresh to 30seconds.
Could anyone point me in the right direction or could give me any insight?
Thanks in advance