Overcome 2600 msg/seconds with Logstash/Filebeat

I'm currently testing performance of file ingest but I'm not getting much right now. About 2500-2600 msg/secs. filebeat (1.2.3) -> logstash (2.3.2) -> elastic (2.3.3) with default configurations.

I've got a directory where I extract my logs, they get picked up by filebeat which a fairly generic configuration. These files are about 15GB in size, 45 million lines and each events are about 500 bytes.

The server with filebeat is using about 10% CPU (VM with only 1 CPU, just noticed).

Where could be the bottleneck here?

filebeat:
publish_async: true
prospectors:
-
paths:
- "/Volumes/Disk1/spool/*.log"
input_type: custom_ats_2
type: custom_ats_2
document_type: custom_ats_2

  tail_files: false
  close_older: 1h
  tags:
    sourcetype: custom_ats_2
  fields_under_root: true

output:
logstash:
hosts:
- logstash:5044

The container running logstash is fairly powerful... 12 cores, 192GB ram. Process usage about 160% for logstash.

input {
beats {
port => 5044
}
}

filter {
grok {
match => [ "message", "%{NUMBER:timestamp}" ]
}
kv {
source => "message"
include_keys => ["chi", "phn", "shn", "url", "cqhm", "cqhv", "pssc", "ttms", "b", "sssc", "sscl", "cfsc", "pfsc", "crc", "phr", "uas", "range"]
}
date {
match => [ "timestamp", "UNIX" ]
}

metrics {
meter => "events"
add_tag => "metric"
}

mutate
{
remove_field => [ "message", "source", "beat", "beat.hostname","input_type", "count", "offset" ]
remove_tag => ["beat"]
convert => {
"ttms" => "integer"
"b" => "integer"
"sscl" => "integer"
}
}
}

output {
if "metric" in [tags] {
file {
path => "/var/log/logstash/metrics.log"
codec => line {
format => "%{[time]} rate: %{[events][rate_1m]}"
}
}
} else {
elasticsearch {
hosts => ["hpc01","hpc02","hpc03","hpc04"]
}
}
}

Here's the metrics I'm gathering (which my time field doesn't expand, need to fix).
[root@cdn1cdlogp0001 ~]# tail -f /var/log/logstash/metrics.log
%{[time]} rate: 2549.4839094518907
%{[time]} rate: 2553.3790329886356
%{[time]} rate: 2554.6280165500852
%{[time]} rate: 2558.8154491438468
%{[time]} rate: 2561.660632737703
%{[time]} rate: 2564.4222480654926
%{[time]} rate: 2553.2906517247943

When I removed Elastic Search from logstash output, I got better numbers, but logstash process usage rose to 500%.
%{[time]} rate: 4299.910569505572
%{[time]} rate: 4364.3139484306475
%{[time]} rate: 4401.899953858465
%{[time]} rate: 4465.296741188148
%{[time]} rate: 4506.16230143131
%{[time]} rate: 4525.066816024065
%{[time]} rate: 4572.954869346784
%{[time]} rate: 4596.049650849632
%{[time]} rate: 4627.30831486682
%{[time]} rate: 4627.171725551646

A few things made it a little better end to end up to about 5000 events/sec... Of course CPU usage more than doubled everywhere...

  • Logstash LS_OPTS="-w 12 -b 2000"
  • Elastic Search output workers => 4
1 Like

I would say filebeat is the bottleneck. I was able to push 50,000 EPS through Logstash with minimal filter in my performance test. One way to test is to remove all Logstash filter and measure the throughput.

1 Like

Removing the filter didn't change the numbers with filebeat in the picture...

Although, once I removed FileBeat and used Logstash file input I could get more performance out of it.

I think the biggest bottleneck is the default (1) workers for Elastic Search Logstash output. I'll need to retest with filebeat.

File Beat -> Logstash (kv) -> Elastic Search : 5,000 EPS
Logstash (file input, kv, no output) : 15,600 EPS
Logstash (file input, no kv, no output) : 20,000 EPS
Logstash (file input, kv, es 4 workers)-> Elastic Search : 7,800 EPS
Logstash (file input, kv, es 8 workers)-> Elastic Search : 12,000 EPS

It's getting better... Looks like I'll need to do more testing introducing redis/kafka and having separate logstash for ingest and indexing.

1 Like