[SOLVED] Date filter slowing pipeline

mikula · November 18, 2015, 9:53am

Hello,
I have problem when I turn on date parsing in my logstash config, whole pipeline gas a LOT slower (about 10 times).
My ELK stack is quite capable (4 8 core Xeon, 16 gb memory severs) running whole ELK on them.
Any help would be appreciated
A

warkolm · November 18, 2015, 10:20pm

Providing your config and the version you are on would help.

mikula · November 20, 2015, 7:58am

Hello,
sorry that I haven't done so before. I just learned about LS and ES 2.
But I have no luck my problem is still present.
I have around 300 servers running logstash-shipper instance shipping to 4 LS/ES servers with 2 4core xeons and 16gb ram running one ES instance and two LS instances. All is locally indexed into ES and then shipped to other server to be stored in gzipped file.
That last server also listens to syslog port from rest of devices incapable or to be migrated to LS, it is also part of ES cluster.
Whole pipeline has performace (according to ES) around 8000 entries per minute.
When I turn off parsing of dates I get more than tenfold throughput.
This is sample config (other is different hust by input, some local files, syslog input, and/or output also local files)

LS config
is too long so here it is (validity of link is one month)
http://pastebin.com/bX1cb2A4

grok patterns

TZ (?:[PMCE][SD]T|UTC|CEST|CET)
TIMESTAMP_ISO8601_STR %{YEAR}%{MONTHNUM}%{MONTHDAY}T?%{HOUR}?%{MINUTE}?%{SECOND}
MESSAGE %{GREEDYDATA}
MAILQUEUE (?:[0-9A-F]{9,14})
TIME_12 %{TIME} ?(?<ampm>[AaPp][Mm]?)
DMDTY %{DAY} ?%{MONTH} ?%{MONTHDAY} %{TIME} %{YEAR}
YMD %{YEAR}[-/ ]%{MONTHNUM}[-/ ]%{MONTHDAY}
YMDT %{YMD}[ -/.]+%{TIME}
YMDR %{YEAR}%{MONTHNUM}%{MONTHDAY}

Christian_Dahlqvist · November 20, 2015, 11:38am

You have a very large number of patterns for each date filter to process, which will cause a lot of regular expression parsing. I would recommend trying to normalise the date before applying the date filter, e.g. by using a mutate gsub to replace all commas with periods and possibly also zero pad single digits. This should reduce the number of required patterns quite a bit. I would also recommend trying to put the most common pattern first in the list.

mikula · November 30, 2015, 3:29pm

Tahnk you for help.
It seem tahat reducing number of paterns was just on side of coin.
The other was that load on last cluster was too high.
I switched to sshfs based network mount to alleviate load.
And now All seems ok
AM

Topic		Replies	Views
One vs Multiple indices on ES using LS's 'date' filter Logstash	15	1912	July 6, 2017
Is Date filter costly? Logstash	6	713	July 6, 2017
Elasticsearch failed to parse date field with logstash Logstash	9	24400	July 6, 2017
Failed parsing date from field using logstash Logstash	8	1163	July 6, 2017
Logstash parse too slow to elasticsearch Logstash	9	2309	March 2, 2018

[SOLVED] Date filter slowing pipeline

Related topics