Overcome 2600 msg/seconds with Logstash/Filebeat

I'm currently testing performance of file ingest but I'm not getting much right now. About 2500-2600 msg/secs. filebeat (1.2.3) -> logstash (2.3.2) -> elastic (2.3.3) with default configurations.

I've got a directory where I extract my logs, they get picked up by filebeat which a fairly generic configuration. These files are about 15GB in size, 45 million lines and each events are about 500 bytes.

The server with filebeat is using about 10% CPU (VM with only 1 CPU, just noticed).

Where could be the bottleneck here?

filebeat:
publish_async: true
prospectors:
-
paths:
- "/Volumes/Disk1/spool/*.log"
input_type: custom_ats_2
type: custom_ats_2
document_type: custom_ats_2

  tail_files: false
  close_older: 1h
  tags:
    sourcetype: custom_ats_2
  fields_under_root: true

output:
logstash:
hosts:
- logstash:5044

The container running logstash is fairly powerful... 12 cores, 192GB ram. Process usage about 160% for logstash.

input {
beats {
port => 5044
}
}

filter {
grok {
match => [ "message", "%{NUMBER:timestamp}" ]
}
kv {
source => "message"
include_keys => ["chi", "phn", "shn", "url", "cqhm", "cqhv", "pssc", "ttms", "b", "sssc", "sscl", "cfsc", "pfsc", "crc", "phr", "uas", "range"]
}
date {
match => [ "timestamp", "UNIX" ]
}

metrics {
meter => "events"
add_tag => "metric"
}

mutate
{
remove_field => [ "message", "source", "beat", "beat.hostname","input_type", "count", "offset" ]
remove_tag => ["beat"]
convert => {
"ttms" => "integer"
"b" => "integer"
"sscl" => "integer"
}
}
}

output {
if "metric" in [tags] {
file {
path => "/var/log/logstash/metrics.log"
codec => line {
format => "%{[time]} rate: %{[events][rate_1m]}"
}
}
} else {
elasticsearch {
hosts => ["hpc01","hpc02","hpc03","hpc04"]
}
}
}

Here's the metrics I'm gathering (which my time field doesn't expand, need to fix).
[root@cdn1cdlogp0001 ~]# tail -f /var/log/logstash/metrics.log
%{[time]} rate: 2549.4839094518907
%{[time]} rate: 2553.3790329886356
%{[time]} rate: 2554.6280165500852
%{[time]} rate: 2558.8154491438468
%{[time]} rate: 2561.660632737703
%{[time]} rate: 2564.4222480654926
%{[time]} rate: 2553.2906517247943

When I removed Elastic Search from logstash output, I got better numbers, but logstash process usage rose to 500%.
%{[time]} rate: 4299.910569505572
%{[time]} rate: 4364.3139484306475
%{[time]} rate: 4401.899953858465
%{[time]} rate: 4465.296741188148
%{[time]} rate: 4506.16230143131
%{[time]} rate: 4525.066816024065
%{[time]} rate: 4572.954869346784
%{[time]} rate: 4596.049650849632
%{[time]} rate: 4627.30831486682
%{[time]} rate: 4627.171725551646

A few things made it a little better end to end up to about 5000 events/sec... Of course CPU usage more than doubled everywhere...

  • Logstash LS_OPTS="-w 12 -b 2000"
  • Elastic Search output workers => 4

I would say filebeat is the bottleneck. I was able to push 50,000 EPS through Logstash with minimal filter in my performance test. One way to test is to remove all Logstash filter and measure the throughput.

Removing the filter didn't change the numbers with filebeat in the picture...

Although, once I removed FileBeat and used Logstash file input I could get more performance out of it.

I think the biggest bottleneck is the default (1) workers for Elastic Search Logstash output. I'll need to retest with filebeat.

File Beat -> Logstash (kv) -> Elastic Search : 5,000 EPS
Logstash (file input, kv, no output) : 15,600 EPS
Logstash (file input, no kv, no output) : 20,000 EPS
Logstash (file input, kv, es 4 workers)-> Elastic Search : 7,800 EPS
Logstash (file input, kv, es 8 workers)-> Elastic Search : 12,000 EPS

It's getting better... Looks like I'll need to do more testing introducing redis/kafka and having separate logstash for ingest and indexing.