After updating from V1.7 to V5.6.3, my ELK with Filebeat web log cluster is having performance problems processing the log files.
I am hoping there are some configuration changes to Logstash or Filebeat that I can make to have the web logs be processed in a timely manner.
I was going to wait until we got full support set up with Elastic but my purchase order is stuck somewhere in accounting and our busy Holiday E-commerce time is apporaching quickly.
I have some log files still being processed 24 hours after they are generated. At that time, they time out and close with my settings.
I have 8 master/data nodes with 40 cores, 64GBytes ram and 14TBytes SSD disks.
Also have 4 index nodes that are 2 cores and 8GBytes Ram
Our web logs are 40MBytes in size and cover about 10 seconds of web logs each.
We process between 6500 and 9000 of these a day as traffic goes up and down for a total of 250-300 GBytes of web log data per day.
We only keep 8 days of data active in order to keep our Java Heap Memory usage under control.
I started the ELK stack one filebeat service and 3 logstash services with no files for the prospector to
read and nothing being processed. I ran two tests in this configuration:
TEST1--I placed one of our 40MByte log files into the folder and the filebeat prospector found it fairly quickly and started
processing the log file with 71,177 events in it. This log file that represents about 10 seconds of our web logs took
over 10 minutes to process.
TEST2--Knowing that the ELK stack is meant for parallel processing of the files, I tried the same experiment with 4 40 MByte files or
294,011 events (42 seconds) which took 21.5 minutes
A larger test run of 10 files using the whole ELK stack of 2 filebeat and 5 logstash services ran for 2.5 hours processing 705,009 events.
In addition to elasticsearch on each node, I have installed the following:
1 copy of Kibana
2 Filebeats
5 logstash -- 3 being fed from one filebeat instance and 2 from the other filebeat instance
FILBEAT.YML -- only active lines included
filebeat.prospectors:
-
input_type: log
paths:- /data/3/tmp/Ex*
document_type: nsweblog
exclude_lines: ["^#"]encoding: utf-8
ignore_older: 15h
close_inactive: 10m
close_removed: true
Set close_timout to 24h or we will never catch up
close_timeout: 24h
original value for clean_inactive was 72h -- cut in half to 36 to see if we can reduce hte numbr of open files
clean_inactive: 28h
clean_removed: true
Certona -- try limiting the number of harvesters to 1000 for catchup after changing to new setup on 20171017
harvester_limit: 1000
output.logstash:
The Logstash hosts
#hosts: ["localhost:5044"]
hosts: ["1.1.1.62:5044","1.1.1.65:5044","1.1.1.66:5044"]
loadbalance: true
worker: 2
LOGSTASH.YML Config File -- only active lines included
path.data: /var/lib/logstash
pipeline.workers: 80
pipeline.batch.size: 500
path.config: /etc/logstash/conf.d
path.logs: /var/log/logstash
LOGSTASH.CONF File
input {
beats {
port => "5044"
}
}
filter {
grok {
match => {"message" => ["%{TIMESTAMP_ISO8601:logtimestamp}%{SPACE}%{IPV4:c-ip}%{SPACE}%{USERNAME:cs-username}%{SPACE}(?:%{HOSTNAME:sc-servicename})%{SPACE}%{IPV4:s-ip}%{SPACE}%{BASE16NUM:s-port}%{SPACE}%{WORD:cs-method}%{SPACE}(%{URIPROTO}:/)?%{URIPATH:cs-uri-stem}%{SPACE}%{NOTSPACE:cs-uri-query}%{SPACE}%{INT:sc-status:int}%{SPACE}%{INT:cs-bytes:int}%{SPACE}%{INT:sc-bytes:int}%{SPACE}%{INT:time-taken:int}%{SPACE}%{NOTSPACE:cs-version}%{SPACE}%{NOTSPACE:cs-User-Agent}%{SPACE}%{USERNAME:cs-Cookie}%{SPACE}%{NOTSPACE:cs-Referer}"] }
}
geoip {
source => "c-ip"
}
mutate {
lowercase => ["cs-uri-query"]
}
kv {
trim_value => "<>[]"
field_split => "&"
source => "cs-uri-query"
include_keys => [ "appid","tk","ss","sc","ur","rf","pg","ev","ei","qty","pr","cu","tt","tr","plk","sg","no","bx","vr","trackingid","sessionid","scheme","url","referrer","pageid","event","eventitems","qty","price","customerid","total","transactionid","links","segment","number","campaignid" ]
}
mutate {
rename => { "tk" => "trackingid" }
rename => { "ss" => "sessionid" }
rename => { "sc" => "scheme" }
rename => { "ur" => "url" }
rename => { "rf" => "referrer" }
rename => { "pg" => "pageid" }
rename => { "ev" => "event" }
rename => { "ei" => "eventitems" }
rename => { "qty" => "qty" }
rename => { "pr" => "price" }
rename => { "cu" => "customerid" }
rename => { "tt" => "total" }
rename => { "tr" => "transactionid" }
rename => { "plk" => "links" }
rename => { "sg" => "segment" }
rename => { "no" => "number" }
rename => { "ex" => "exitemid" }
}
date {
match => [ "logtimestamp", "yyyy-MM-dd HH:mm:ss"]
timezone => "UTC"
}
}
output {
elasticsearch {
user => "user"
password => "password"
index => "nsweblog-%{+YYYY.MM.dd.HH}"
hosts => ["1.1.1.61:9200","1.1.1.62:9200","1.1.1.63:9200","1.1.1.64:9200","1.1.1.65:9200","1.1.1.66:9200","1.1.1.67:9200","1.1.1.68:9200"]
}
}