After upgrade to 5.6.3 Logstash processing much slower

After updating from V1.7 to V5.6.3, my ELK with Filebeat web log cluster is having performance problems processing the log files.

I am hoping there are some configuration changes to Logstash or Filebeat that I can make to have the web logs be processed in a timely manner.
I was going to wait until we got full support set up with Elastic but my purchase order is stuck somewhere in accounting and our busy Holiday E-commerce time is apporaching quickly.

I have some log files still being processed 24 hours after they are generated. At that time, they time out and close with my settings.

I have 8 master/data nodes with 40 cores, 64GBytes ram and 14TBytes SSD disks.
Also have 4 index nodes that are 2 cores and 8GBytes Ram

Our web logs are 40MBytes in size and cover about 10 seconds of web logs each.
We process between 6500 and 9000 of these a day as traffic goes up and down for a total of 250-300 GBytes of web log data per day.

We only keep 8 days of data active in order to keep our Java Heap Memory usage under control.

I started the ELK stack one filebeat service and 3 logstash services with no files for the prospector to
read and nothing being processed. I ran two tests in this configuration:
TEST1--I placed one of our 40MByte log files into the folder and the filebeat prospector found it fairly quickly and started
processing the log file with 71,177 events in it. This log file that represents about 10 seconds of our web logs took
over 10 minutes to process.

TEST2--Knowing that the ELK stack is meant for parallel processing of the files, I tried the same experiment with 4 40 MByte files or
294,011 events (42 seconds) which took 21.5 minutes

A larger test run of 10 files using the whole ELK stack of 2 filebeat and 5 logstash services ran for 2.5 hours processing 705,009 events.

In addition to elasticsearch on each node, I have installed the following:
1 copy of Kibana
2 Filebeats
5 logstash -- 3 being fed from one filebeat instance and 2 from the other filebeat instance

FILBEAT.YML -- only active lines included

filebeat.prospectors:

  • input_type: log
    paths:

    • /data/3/tmp/Ex*

    document_type: nsweblog
    exclude_lines: ["^#"]

    encoding: utf-8
    ignore_older: 15h
    close_inactive: 10m
    close_removed: true

Set close_timout to 24h or we will never catch up

close_timeout: 24h

original value for clean_inactive was 72h -- cut in half to 36 to see if we can reduce hte numbr of open files

clean_inactive: 28h
clean_removed: true

Certona -- try limiting the number of harvesters to 1000 for catchup after changing to new setup on 20171017

harvester_limit: 1000

output.logstash:

The Logstash hosts

#hosts: ["localhost:5044"]
hosts: ["1.1.1.62:5044","1.1.1.65:5044","1.1.1.66:5044"]
loadbalance: true
worker: 2

LOGSTASH.YML Config File -- only active lines included

path.data: /var/lib/logstash
pipeline.workers: 80
pipeline.batch.size: 500
path.config: /etc/logstash/conf.d
path.logs: /var/log/logstash

LOGSTASH.CONF File

input {
beats {
port => "5044"
}
}
filter {
grok {
match => {"message" => ["%{TIMESTAMP_ISO8601:logtimestamp}%{SPACE}%{IPV4:c-ip}%{SPACE}%{USERNAME:cs-username}%{SPACE}(?:%{HOSTNAME:sc-servicename})%{SPACE}%{IPV4:s-ip}%{SPACE}%{BASE16NUM:s-port}%{SPACE}%{WORD:cs-method}%{SPACE}(%{URIPROTO}:/)?%{URIPATH:cs-uri-stem}%{SPACE}%{NOTSPACE:cs-uri-query}%{SPACE}%{INT:sc-status:int}%{SPACE}%{INT:cs-bytes:int}%{SPACE}%{INT:sc-bytes:int}%{SPACE}%{INT:time-taken:int}%{SPACE}%{NOTSPACE:cs-version}%{SPACE}%{NOTSPACE:cs-User-Agent}%{SPACE}%{USERNAME:cs-Cookie}%{SPACE}%{NOTSPACE:cs-Referer}"] }
}
geoip {
source => "c-ip"
}
mutate {
lowercase => ["cs-uri-query"]
}
kv {
trim_value => "<>[]"
field_split => "&"
source => "cs-uri-query"
include_keys => [ "appid","tk","ss","sc","ur","rf","pg","ev","ei","qty","pr","cu","tt","tr","plk","sg","no","bx","vr","trackingid","sessionid","scheme","url","referrer","pageid","event","eventitems","qty","price","customerid","total","transactionid","links","segment","number","campaignid" ]
}

mutate {
   rename => { "tk" => "trackingid" }
   rename => { "ss" => "sessionid" }
   rename => { "sc" => "scheme" }
   rename => { "ur" => "url" }
   rename => { "rf" => "referrer" }
   rename => { "pg" => "pageid" }
   rename => { "ev" => "event" }
   rename => { "ei" => "eventitems" }
   rename => { "qty" => "qty" }
   rename => { "pr" => "price" }
   rename => { "cu" => "customerid" }
   rename => { "tt" => "total" }
   rename => { "tr" => "transactionid" }
   rename => { "plk" => "links" }
   rename => { "sg" => "segment" }
   rename => { "no" => "number" }
   rename => { "ex" => "exitemid" }
}
date {
match => [ "logtimestamp", "yyyy-MM-dd HH:mm:ss"]

timezone => "UTC"

}

}

output {
elasticsearch {
user => "user"
password => "password"
index => "nsweblog-%{+YYYY.MM.dd.HH}"
hosts => ["1.1.1.61:9200","1.1.1.62:9200","1.1.1.63:9200","1.1.1.64:9200","1.1.1.65:9200","1.1.1.66:9200","1.1.1.67:9200","1.1.1.68:9200"]
}
}

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.