Hi,
I have a problem with logs being ingested into elastic with a delay. This delay does not start from the beginning but it builds up. Right now the delay is 1 day and 10 hours.
We checked the Source device sending the logs (hob-fz-0001, 10.18.198.12), CPU is normal, Interface BW usage is OK (max observed 80Mbps in, 105Mbps out, it’s a physical 1Gbps interface)
We checked the physical interfaces of the Logstash vmware hosts : OK, no visible BW issues, max BW usage seen was 150Mbps in, 210 Mbps out.
Captures seem to suggest that the logstash is throttling the TCP connection from the source log server.
Our setup is an ELK stack with 2 logstash machines, 3 elasticsearch nodes and 1 Kibana. We use virtual machines with 2 different virtual hosts.
the config of our logstash is this completely default, nothing changed.
this is our pipeline:
- pipeline.id: zscaler
path.config: "/logstash-7.11.2/config/syslog_zscaler.conf"
- pipeline.id: main
path.config: "/logstash-7.11.2/config/syslog.conf"
and the config of the source in specific is the following:
filter{
if [type] == "FortiAnalyzer"{
if [ad.vd] =="GUEST" {
mutate { add_field => { "[@metadata][drop]" => "drop"}}
}
}
}
output {
# Forti Analyzer
if [type] == "FortiAnalyzer" {
if [deviceAction] == "accept" and [ad.vd] != "GUEST" {
microsoft-logstash-output-azure-loganalytics {
workspace_id =>
workspace_key =>
custom_log_table_name => "Logstash_Fortinet"
plugin_flush_interval => 5
}
}
elasticsearch {
action => "index"
hosts => ["10.18.193.68:9200","10.18.193.69:9200"]
index => "fortianalyzer-%{+YYYY.MM.dd}"
user => elastic
password => "${ES_PWD}"
}
}
}
we use a separate input file:
input {
tcp{
port => 1504
host => "0.0.0.0"
codec => cef { delimiter => "\r\n" }
}
tcp{
port => 1505
host => "0.0.0.0"
codec => cef { delimiter => "\r\n" }
}
udp{
port => 514
host => "10.38.193.50"
codec => cef { }
}
tcp{
port => 514
host => "0.0.0.0"
codec => cef {
delimiter => "\r\n"
}
}
tcp{
port => 1516
host => "0.0.0.0"
codec => cef {
delimiter => 'tz="+0000"'
}
type => "FortiAnalyzer"
}
tcp{
port => 1518
host => "0.0.0.0"
codec => cef { delimiter => "\n" }
# delimiter => "\r\n"
type => "CheckPoint"
}
}
in the screenshot you can see the delay.
The problem starts at the Logstash machine, so it does not happen between Logstash and the Elasticsearch nodes. I tried changing the Logstash settings to this:
pipeline.batch.delay: 100
pipeline.batch.size: 250
pipeline.workers: 4
queue.checkpoint.writes: 4096
but this had no influence on the logs. We have 10 different sources and this is the only source that has the problem. Any ideas on how to solve this problem?
Kind regards,
Tom