Hello all!
I was curious if adding more beats inputs (and adding the additional ports to filebeat and loadbalance:true) would increase throughput of logstash?
Hello all!
I was curious if adding more beats inputs (and adding the additional ports to filebeat and loadbalance:true) would increase throughput of logstash?
Not significantly I think, if at all. The beats input code looks fairly multithreaded and your bottleneck is probably going to be your filters and/or outputs anyway.
gotcha ok. If my CPU is looking pretty low and the load on my ES cluster is fairly low is there a way to process more "filters" simultaneously?
You can increase the number of filter threads with the -w
startup option. Perhaps you have a dns filter or similar that's adding latency to the processing?
I did increase the filter threads and the output workers on elasticsearch output and that definitely made a difference. I can see the elasticsearch servers are reporting a much higher load. I reduced the index refresh on the high insert indices and that helped alot. Still doing some fine tuning to get higher inserts.
Thanks!
Here is my config as well
filter {
grok {
patterns_dir => "./patterns"
match => { "message" => "%{SYSLOGLINE}" }
}
ruby {
code => "event['[timestamp]'] = event['[timestamp][1]']"
}
ruby {
code => "event['[originalmsg]'] = event['[message][0]']"
}
if [originalmsg] == "0" {
drop { }
}
date {
match => [ "timestamp", "MMM d HH:mm:ss", "MMM dd HH:mm:ss" ]
timezone => "UTC"
}
ruby {
code => "event['[microtime]'] = Time.now.to_f"
}
ruby {
code => "event['[message]'] = event['[message][1]']"
}
grok {
patterns_dir => "./patterns"
match => {
"message" => [
"INFO: <script>: %{SYSLOG5424SD:callid}",
"\[LOGSTASH] \[REGISTER] %{SYSLOG5424SD:remove} %{SYSLOG5424SD:callid}",
"%{SYSLOG5424SD:callid}",
"ERROR: %{GREEDYDATA}"
]
}
}
mutate {
remove_field => [ "%{remove}" ]
}
mutate {
gsub => [
"callid", "\[", "",
"callid", "\]", ""
]
}
if [callid] == "lvalue.c:347" {
drop {}
}
if [callid] == "lvalue.c:407" {
drop {}
}
}
Weird set of filters. Can you post an example input message and the expected output?
Sure!
syslog message is
Nov 3 01:05:03 sip-central01 /usr/local/sbin/kamailio[2251]: [1148651948-5078-1] 233008607 [REGISTER][REGISTER] [sip:3605531383@sip-central01.voipwelcome.com:5078;user=phone][50.137.25.153] Registered
Output is
{
"message" => "[1148651948-5078-1] 233008607 [REGISTER][REGISTER] [sip:3605531383@sip-central01.voipwelcome.com:5078;user=phone][50.137.25.153] Registered",
"@version" => "1",
"@timestamp" => "2015-11-03T01:05:03.000Z",
"count" => 1,
"fields" => nil,
"fileinfo" => {},
"line" => 20167013,
"offset" => 5635586029,
"shipper" => "sip-central01",
"source" => "/var/log/kamailio/2015-11-03.log",
"timestamp" => "Nov 3 01:05:03",
"type" => "log",
"logsource" => "sip-central01",
"program" => "/usr/local/sbin/kamailio",
"pid" => "2251",
"originalmsg" => "Nov 3 01:05:03 sip-central01 /usr/local/sbin/kamailio[2251]: [1148651948-5078-1] 233008607 [REGISTER][REGISTER] [sip:3605531383@sip-central01.voipwelcome.com:5078;user=phone][50.137.25.153] Registered",
"microtime" => 1446675476.939,
"callid" => "1148651948-5078-1"
}
With only basic grok match
{
"message" => [
[0] "Nov 4 15:23:45 sip-core02 /usr/local/sbin/kamailio[20804]: INFO: <script>: [208182901_131874650@192.168.18.110] SELECT b.mailbox, a.ring_count, b.status FROM phone_numbers a LEFT JOIN voicemail b ON (a.voicemail_id = b.id) WHERE a.id = '167436'",
[1] "INFO: <script>: [208182901_131874650@192.168.18.110] SELECT b.mailbox, a.ring_count, b.status FROM phone_numbers a LEFT JOIN voicemail b ON (a.voicemail_id = b.id) WHERE a.id = '167436'"
],
"@version" => "1",
"@timestamp" => "2015-11-04T22:20:39.032Z",
"count" => 1,
"fields" => nil,
"fileinfo" => {},
"line" => 2415401,
"offset" => 415513004,
"shipper" => "sip-core02",
"source" => "/var/log/kamailio/2015-11-04.log",
"timestamp" => [
[0] "2015-11-04T22:20:14.961Z",
[1] "Nov 4 15:23:45"
],
"type" => "log",
"logsource" => "sip-core02",
"program" => "/usr/local/sbin/kamailio",
"pid" => "20804"
}
(filter with above message)
filter {
grok {
patterns_dir => "./patterns"
match => { "message" => "%{SYSLOGLINE}" }
}
}
With no filters at all
{
"message" => "Nov 2 22:50:06 sip-core01 /usr/local/sbin/kamailio[13758]: INFO: <script>: [1974599503-5078-570@BA.A.A.D] replace from-uri=<null> from-name=<null>",
"@version" => "1",
"@timestamp" => "2015-11-04T22:22:31.236Z",
"count" => 1,
"fields" => nil,
"fileinfo" => {},
"line" => 22388245,
"offset" => 4258408627,
"shipper" => "sip-core01",
"source" => "/var/log/kamailio/2015-11-02.log",
"timestamp" => "2015-11-04T22:22:07.895Z",
"type" => "log"
}
Logs are about 2-5GB per day and up to 24333203 lines in the largest example from last 5 days
It looks like your SYSLOGLINE pattern captures multiple instances of the timestamp
, originalmsg
, and message
fields. Why not just capture them to different fields from the start (or not capture at all if you don't actually need the values) so that you can drop your three ruby filters?
Secondly, what is
mutate {
remove_field => [ "%{remove}" ]
}
supposed to do? It either deletes the field whose name is given by the remove
field or it removes the field named %{remove}
(not sure if remove_field
does %{varname} interpolation), neither of which makes any sense. If what you really wanted to do is remove the remove
field, just don't capture that string (say %{SYSLOG5424SD}
in the grok expression).
Finally, if you want to avoid the gsub you can just exclude the square bracket from the match by capturing just what's inside it. Untested:
\[(?<callid>[^\]]+)\]
Thanks alot for the help!
I did manage to clean this up quite a bit by using named_capures_only and also the timestamp issue was the beats input adds a timestamp and the grok also has a timestamp. So I just changed the grok to be timestamp2 and remove both timestamp and timestamp2 on date match.
I also took your advice on getting rid of the gsub and also the remove field thing was a ghetto hack.
In filebeat I also changed the harvester size and the spool size to something much larger. Harvester size is now 32786 and spool size is 15000. Since this server has 24 cores and 48 GB of ram it seems to be performing much better. I also moved my beefiest server to the first position in the load balance logstash array. I also changed the flush size to 5000 and idle timeout to 5s on the elasticsearch output. In elasticsearch I also changed index refresh interval to 30s and that helped alot as well.
With all these improvements combined I seem to have gotten much better performance and I appreciate all the help!
© 2020. All Rights Reserved - Elasticsearch
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant logo are trademarks of the Apache Software Foundation in the United States and/or other countries.