[SOLVED] Date filter slowing pipeline


#1

Hello,
I have problem when I turn on date parsing in my logstash config, whole pipeline gas a LOT slower (about 10 times).
My ELK stack is quite capable (4 8 core Xeon, 16 gb memory severs) running whole ELK on them.
Any help would be appreciated
A


(Mark Walkom) #2

Providing your config and the version you are on would help.


#3

Hello,
sorry that I haven't done so before. I just learned about LS and ES 2.
But I have no luck my problem is still present.
I have around 300 servers running logstash-shipper instance shipping to 4 LS/ES servers with 2 4core xeons and 16gb ram running one ES instance and two LS instances. All is locally indexed into ES and then shipped to other server to be stored in gzipped file.
That last server also listens to syslog port from rest of devices incapable or to be migrated to LS, it is also part of ES cluster.
Whole pipeline has performace (according to ES) around 8000 entries per minute.
When I turn off parsing of dates I get more than tenfold throughput.
This is sample config (other is different hust by input, some local files, syslog input, and/or output also local files)

LS config
is too long so here it is (validity of link is one month)
http://pastebin.com/bX1cb2A4

grok patterns

TZ (?:[PMCE][SD]T|UTC|CEST|CET)
TIMESTAMP_ISO8601_STR %{YEAR}%{MONTHNUM}%{MONTHDAY}T?%{HOUR}?%{MINUTE}?%{SECOND}
MESSAGE %{GREEDYDATA}
MAILQUEUE (?:[0-9A-F]{9,14})
TIME_12 %{TIME} ?(?<ampm>[AaPp][Mm]?)
DMDTY %{DAY} ?%{MONTH} ?%{MONTHDAY} %{TIME} %{YEAR}
YMD %{YEAR}[-/ ]%{MONTHNUM}[-/ ]%{MONTHDAY}
YMDT %{YMD}[ -/.]+%{TIME}
YMDR %{YEAR}%{MONTHNUM}%{MONTHDAY}

(Christian Dahlqvist) #4

You have a very large number of patterns for each date filter to process, which will cause a lot of regular expression parsing. I would recommend trying to normalise the date before applying the date filter, e.g. by using a mutate gsub to replace all commas with periods and possibly also zero pad single digits. This should reduce the number of required patterns quite a bit. I would also recommend trying to put the most common pattern first in the list.


#5

Tahnk you for help.
It seem tahat reducing number of paterns was just on side of coin.
The other was that load on last cluster was too high.
I switched to sshfs based network mount to alleviate load.
And now All seems ok
AM


(system) #6