Logstash -> multiple match patterns increase my load average


(Sebastian Herrera) #1

Hello, i am doing some parsing on my logstash. I have adde multiple patterns to my grok match, but the thing that is happening is that once i did that my load average increase too much. It is possible that the number of patters affect that? or that the regex can do it? i am receiving in average 50 logs per sec

grok {
patterns_dir => ["${CFG_DIR}/conf.d/patterns"]
match => [
"message", "%{EVENTID_4768_4771}",
"message", "%{EVENTID_4776}",
"message", "%{EVENTID_4625}",
"message", "%{EVENTID_4740}",
"message", "%{EVENTID_4724_4767_4738_4726}",
"message", "%{EVENTID_4742}",
"message", "%{EVENTID_4720}",
"message", "%{EVENTID_4741}",
"message", "%{EVENTID_18456}",
"message", "%{FAILURE_AUTH_ACCOUNT}",
"message", "%{SUCCESFUL_AUTH_ACCOUNT}",
"message", "%{ACCOUNT_CREATION}",
"message", "%{ACCOUNT_DELETION}",
"message", "%{PASSWORD_CHANGED}",
"message", "%{PASSWORD_CHANGED2}",
"message", "%{FILE_DELETION}",
"message", "%{ACCESS_ATTEMPT}",
"message", "%{OBJECT_DELETION}",
"message", "%{GROUP_DELETION}",
"message", "%{GROUP_CREATION}",
"message", "%{PROGRAM_INSTALLATION_DELETION}",
"message", "%{EVENTID_4743}",
"message", "%{EVENTID_1033_1034}",
"message", "%{EVENTID_4727_4730}",
"message", "%{EVENTID_4728_4729_4752}",
"message", "%{CATCH_ALL}"
]
}
Here some of the regexs

#EVENTID 4724 4767 4738 4726
EVENTID_4724_4767_4738_4726 \s*%{GREEDYDATA:action}.\sSubject%{GREEDYDATA} Account Name:%{NOTSPACE:username_doing}%{GREEDYDATA}Account Domain:%{NOTSPACE:account_domain_doing}%{GREEDYDATA}Target Account:%{GREEDYDATA}Account Name:%{NOTSPACE:username}%{GREEDYDATA}Account Domain:%{NOTSPACE:account_domain}

#EVENTID 4742
EVENTID_4742 \s*%{GREEDYDATA:action}.\sSubject%{GREEDYDATA} Account Name:%{NOTSPACE:username_doing}%{GREEDYDATA}Account Domain:%{NOTSPACE:account_domain_doing}%{GREEDYDATA}Computer Account That Was Changed:%{GREEDYDATA}Account Name:%{NOTSPACE:username}%{GREEDYDATA}Account Domain:%{NOTSPACE:account_domain}

#EVENTID 4720
EVENTID_4720 \s*%{GREEDYDATA:action}.\sSubject%{GREEDYDATA} Account Name:%{NOTSPACE:username_doing}%{GREEDYDATA}Account Domain:%{NOTSPACE:account_domain_doing}%{GREEDYDATA}New Account:%{GREEDYDATA}Account Name:%{NOTSPACE:username}%{GREEDYDATA}Account Domain:%{NOTSPACE:account_domain}

#EVENTID 4741
EVENTID_4741 \s*%{GREEDYDATA:action}.\sSubject%{GREEDYDATA} Account Name:%{NOTSPACE:username_doing}%{GREEDYDATA}Account Domain:%{NOTSPACE:account_domain_doing}%{GREEDYDATA}New Computer Account:%{GREEDYDATA}Account Name:%{NOTSPACE:username}%{GREEDYDATA}Account Domain:%{NOTSPACE:account_domain}

#EVENTID 4743
EVENTID_4743 \s*%{GREEDYDATA:action}.\sSubject%{GREEDYDATA} Account Name:%{NOTSPACE:username_doing}%{GREEDYDATA}Account Domain:%{NOTSPACE:account_domain_doing}%{GREEDYDATA}Target Computer:%{GREEDYDATA}Account Name:%{NOTSPACE:username}%{GREEDYDATA}Account Domain:%{NOTSPACE:account_domain}

#EVENTID 1033 1034
EVENTID_1033_1034 \s*%{GREEDYDATA:action}.Product Name: %{GREEDYDATA:product}. Product Version:%{GREEDYDATA}error status:\s*%{INT:status}

#EVENTID 4727 4730
EVENTID_4727_4730 \s*%{GREEDYDATA:action}.\sSubject: Security%{GREEDYDATA}Account Name:\s%{NOTSPACE:username_doing}\s*Account Domain:%{NOTSPACE:domain_doing}%{GREEDYDATA}(New Group|Deleted Group):%{GREEDYDATA}Group Name:%{GREEDYDATA:group}Group Domain%{NOTSPACE:domain_group}

#EVENTID 4728 4729 4752
EVENTID_4728_4729_4752 \s*%{GREEDYDATA:action}.\sSubject: Security%{GREEDYDATA}Account Name:\s%{NOTSPACE:username_doing}\s*Account Domain:%{NOTSPACE:domain_doing}%{GREEDYDATA}Member:%{GREEDYDATA}Account Name:(cn|CN)=%{GREEDYDATA:username},OU%{GREEDYDATA} Group Name:%{GREEDYDATA:group}Group Domain:%{NOTSPACE:domain_group}


(Philip Nunn) #2

Couple of things. Your logstash match option looks a bit odd to me. See the examples here. You dont need to be inserting "message" on each line...only once and then an array of patterns below it. Second...you have a lot of greedydata patterns...that slow things way down. You also should be using anchors in your pattern definitions. I also wonder if you could utilize the dissect filter instead. You might want to check it out.


(Christian Dahlqvist) #3

As pointed out, a lot of your grok patterns look very, very inefficient as they use a lot of DATA and GREEDYDATA patterns. Have a look at this blog post around optimising grok patterns.


(Philip Nunn) #4

Curious...are these events coming from Winlogbeat on an AD domain controller? If so, there may be an easier way to parse this using a custom pipeline if the events come through like param1, param2, etc.


(Sebastian Herrera) #5

Thanks all, i will check the post that you sent to me and try to improve my regexs :slight_smile: