Filebeat unable to cope with incoming logs

Hello,

I have a Filebeat which is processing 10-20GB nginx log file. I can't seem to process >500 ouput/s (in measurement of over a 6hours period) to Logstash. I've read from existing topics to tune with parameters such as filebeat.spool_size, worker, bulk_max_size and pipelining. However the performance is still way below my expectation to process at least 1k of output/s.

My Filebeat version is 5.6.3 and Logstash is 5.6.2, both Filebeat and Logstash are sitting in different nodes, but in local LAN. My Filebeat conf is as below:

filebeat.spool_size: 409600
filebeat.idle_timeout: 180s
queue_size: 2000

output.logstash:
worker: 20
timeout: 600
bulk_max_size: 20480
pipelining: 1

Can someone please point me to the right direction? Would appreciate any help here.
Thanks.

To my experience, unless disks are slow (file access over network and/or VM), filebeat is rarely the bottleneck. Other potential reasons for slow downs are network, encoding/decoding events in LS (consider load balancing to multiple LS instances), or the final sink/destination not capable to process this many events (LS and beats will experience back-pressure if ES can not index enough events in time -> overall system slow-down).

If possible I would try with filebeat/LS 6.x (with default config). First get an idea how fast filebeat can process you logs, by pushing to console. For this run filebeat in foreground like this:

rm -fR /tmp/testregistry; filebeat -e -v -c /path/to/filebeat.yml -E filebeat.registry_file=/tmp/testregistry -E output.logstash.enabled=false -E output.console.enabled=true | pv -Warl > /dev/null

The pv tool will print the current and the average event throughput to stderr. By overwriting the registry file, we start processing the log files from the beginning, without overwriting the global state of already published events.

If this goes well, we can do the same with filebeat->logstash. Therefore replace the logstash output configuration with:

output {
  stdout { codec => dots }
}

and run logstash in foreground with ./bin/logstash -f test.conf | pv -War >/dev/null.

This gets you an idea how many events filebeat itself can process + how many events you can forward to LS+filters.

1 Like

Thank you for the pointers. Will give it a go.

Thanks.

Hi Steffen,

My Filebeat is able to push the logs at rate of ~2.5k/s, whilst the Logstash is only about 500/s which is the same throughput as I'm seeing via Kibana. Before this, I've tuned the parameters as below but to no avail:

  • jvm.options:

-Xms10g
-Xmx10g

  • logstash.yml:

pipeline.workers: 12
pipeline.batch.size: 1100

  • My config:

input {
beats {
port => 5400
}
}

filter {
grok {
match => { 'message' => '^%{IPORHOST:clientip} - - [%{HTTPDATE:timestamp}] "(?:%{WORD:verb} %{URIPATHPARAM:request}(?: HTTP/%{NUMBER:httpversion})?|)" %{NUMBER:answer} (?:%{NUMBER:byte}|-) (?:"(?:%{URI:referrer}|-))" %{QS:agent} "%{IPORHOST:proxyip}" "%{IPORHOST:hostname}" "%{GREEDYDATA:ident}"'}
}
mutate {
convert => ["bytes", "integer"]
}
geoip {
source => "clientip"
target => "geoip"
add_tag => [ "nginx-geoip" ]
}
date {
match => [ "timestamp" , "dd/MMM/YYYY:HH:mm:ss Z" ]
remove_field => [ "timestamp" ]
}
useragent {
source => "agent"
}
}

output {
elasticsearch {
hosts => ["localhost:9200"]
index => "weblog-%{+YYYY.MM.dd}"
document_type => "nginx_logs"
}
stdout { codec => rubydebug --> this was dots when tracing the Logstash outputs
}
}

The Logstash, ES and Kibana are in the same VM node (12 Cores with memory of 50GB). The load factor has never reached beyond 4.5.

Thanks

Have you had the elasticsearch output enabled while running your tests? If so, run the tests without Elasticsearch.

Have you tried filebeat->Logstash without filters?

The processing pipeline can also be implemented using Elasticsearch Ingest Node. This would remove Logstash from your processing, reducing the number of Java processes running on the VM node.

Have you had a look at filebeat modules? Filebeat has a module (set of configurations) for nginx logs. Using the module, filebeat will setup an Ingest Pipeline in Elasticsearch for you.

Hi Steffen,

After my reply, I duplicated another 2 VMs to setup everything from scratch. Re-check and analyzed as according to your steps. Interestly, the Logstash's throughput was approx. the same as Filebeat (both at ~2.5k/s). So I startup another Filebeat from 3rd VM, the Logstash was generating ~4.5k/s.

However ES was only generating still ~500/s. Then i began tweaking the no. of shards, and boom!. ES's throughput is now at ~4.5k/s (This is only after I increased the total no. of shards from default 5 to 7 via the template API). Load factor is at 13.x in the 12 cores node.

I'm very puzzled, how the 2 additional index shards could have impacted so much.
Nonetheless, thank you very much for your help.

An index is split into multiple parts. Kind of like partitioned tables in common databases. Adding more shards allows for more parallel processing of events in Elasticsearch. Having multiple Elasticsearch nodes, shards will become distributed, such that you will have some horizontal scaling by adding shards. See Shards section in Basic Concepts doc.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.