Logstash -> multiple match patterns increase my load average

Hello, i am doing some parsing on my logstash. I have adde multiple patterns to my grok match, but the thing that is happening is that once i did that my load average increase too much. It is possible that the number of patters affect that? or that the regex can do it? i am receiving in average 50 logs per sec

grok {
patterns_dir => ["${CFG_DIR}/conf.d/patterns"]
match => [
"message", "%{EVENTID_4768_4771}",
"message", "%{EVENTID_4776}",
"message", "%{EVENTID_4625}",
"message", "%{EVENTID_4740}",
"message", "%{EVENTID_4724_4767_4738_4726}",
"message", "%{EVENTID_4742}",
"message", "%{EVENTID_4720}",
"message", "%{EVENTID_4741}",
"message", "%{EVENTID_18456}",
"message", "%{FAILURE_AUTH_ACCOUNT}",
"message", "%{SUCCESFUL_AUTH_ACCOUNT}",
"message", "%{ACCOUNT_CREATION}",
"message", "%{ACCOUNT_DELETION}",
"message", "%{PASSWORD_CHANGED}",
"message", "%{PASSWORD_CHANGED2}",
"message", "%{FILE_DELETION}",
"message", "%{ACCESS_ATTEMPT}",
"message", "%{OBJECT_DELETION}",
"message", "%{GROUP_DELETION}",
"message", "%{GROUP_CREATION}",
"message", "%{PROGRAM_INSTALLATION_DELETION}",
"message", "%{EVENTID_4743}",
"message", "%{EVENTID_1033_1034}",
"message", "%{EVENTID_4727_4730}",
"message", "%{EVENTID_4728_4729_4752}",
"message", "%{CATCH_ALL}"
]
}
Here some of the regexs

#EVENTID 4724 4767 4738 4726
EVENTID_4724_4767_4738_4726 \s*%{GREEDYDATA:action}.\sSubject%{GREEDYDATA} Account Name:%{NOTSPACE:username_doing}%{GREEDYDATA}Account Domain:%{NOTSPACE:account_domain_doing}%{GREEDYDATA}Target Account:%{GREEDYDATA}Account Name:%{NOTSPACE:username}%{GREEDYDATA}Account Domain:%{NOTSPACE:account_domain}

#EVENTID 4742
EVENTID_4742 \s*%{GREEDYDATA:action}.\sSubject%{GREEDYDATA} Account Name:%{NOTSPACE:username_doing}%{GREEDYDATA}Account Domain:%{NOTSPACE:account_domain_doing}%{GREEDYDATA}Computer Account That Was Changed:%{GREEDYDATA}Account Name:%{NOTSPACE:username}%{GREEDYDATA}Account Domain:%{NOTSPACE:account_domain}

#EVENTID 4720
EVENTID_4720 \s*%{GREEDYDATA:action}.\sSubject%{GREEDYDATA} Account Name:%{NOTSPACE:username_doing}%{GREEDYDATA}Account Domain:%{NOTSPACE:account_domain_doing}%{GREEDYDATA}New Account:%{GREEDYDATA}Account Name:%{NOTSPACE:username}%{GREEDYDATA}Account Domain:%{NOTSPACE:account_domain}

#EVENTID 4741
EVENTID_4741 \s*%{GREEDYDATA:action}.\sSubject%{GREEDYDATA} Account Name:%{NOTSPACE:username_doing}%{GREEDYDATA}Account Domain:%{NOTSPACE:account_domain_doing}%{GREEDYDATA}New Computer Account:%{GREEDYDATA}Account Name:%{NOTSPACE:username}%{GREEDYDATA}Account Domain:%{NOTSPACE:account_domain}

#EVENTID 4743
EVENTID_4743 \s*%{GREEDYDATA:action}.\sSubject%{GREEDYDATA} Account Name:%{NOTSPACE:username_doing}%{GREEDYDATA}Account Domain:%{NOTSPACE:account_domain_doing}%{GREEDYDATA}Target Computer:%{GREEDYDATA}Account Name:%{NOTSPACE:username}%{GREEDYDATA}Account Domain:%{NOTSPACE:account_domain}

#EVENTID 1033 1034
EVENTID_1033_1034 \s*%{GREEDYDATA:action}.Product Name: %{GREEDYDATA:product}. Product Version:%{GREEDYDATA}error status:\s*%{INT:status}

#EVENTID 4727 4730
EVENTID_4727_4730 \s*%{GREEDYDATA:action}.\sSubject: Security%{GREEDYDATA}Account Name:\s%{NOTSPACE:username_doing}\s*Account Domain:%{NOTSPACE:domain_doing}%{GREEDYDATA}(New Group|Deleted Group):%{GREEDYDATA}Group Name:%{GREEDYDATA:group}Group Domain%{NOTSPACE:domain_group}

#EVENTID 4728 4729 4752
EVENTID_4728_4729_4752 \s*%{GREEDYDATA:action}.\sSubject: Security%{GREEDYDATA}Account Name:\s%{NOTSPACE:username_doing}\s*Account Domain:%{NOTSPACE:domain_doing}%{GREEDYDATA}Member:%{GREEDYDATA}Account Name:(cn|CN)=%{GREEDYDATA:username},OU%{GREEDYDATA} Group Name:%{GREEDYDATA:group}Group Domain:%{NOTSPACE:domain_group}

Couple of things. Your logstash match option looks a bit odd to me. See the examples here. You dont need to be inserting "message" on each line...only once and then an array of patterns below it. Second...you have a lot of greedydata patterns...that slow things way down. You also should be using anchors in your pattern definitions. I also wonder if you could utilize the dissect filter instead. You might want to check it out.

As pointed out, a lot of your grok patterns look very, very inefficient as they use a lot of DATA and GREEDYDATA patterns. Have a look at this blog post around optimising grok patterns.

Curious...are these events coming from Winlogbeat on an AD domain controller? If so, there may be an easier way to parse this using a custom pipeline if the events come through like param1, param2, etc.

Thanks all, i will check the post that you sent to me and try to improve my regexs :slight_smile:

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.