Large amount of Grok patterns

Hi All,

I am trying to figure out best way to do event classification.
In my case, below is sample event:

Jul 21 01:19:57.58 date=2016-07-20 time=20:12:25 timezone="UTC" device_name="X2123" device_id=C032302-2323 log_id=98393783 log_type="Event" log_component="GUI" log_subtype="Admin" status="Successful" priority=Notice user_name="admin" src_ip= dmsg="Appliance Access Settings were changed by 'admin' from '' using 'GUI'"

I would like to classify events based on content of dmsg. I have almost 1000 events each with unique content in dmsg. I have written 1000 grok pattern for each such unique message. Below is one of the example:

PID107 Appliance Access Settings were changed by %{QSSTRING} from %{QSSTRING} using %{QSSTRING}

Another example:

PID144 Service %{QSSTRING} was started by %{QSSTRING} from %{QSSTRING} using %{QSSTRING}

I have created configuration file like:

		if "unclassified" in [tags] and [dmsg] {
			grok { patterns_dir => ["/etc/logstash-indexer/patterns"]
				match => { "dmsg" => "%{PID107}" }
				add_field => { "eventid" => "PID107" }
				remove_tag => "unclassified"

		if "unclassified" in [tags] and [dmsg] {
			grok { patterns_dir => ["/etc/logstash-indexer/patterns"]
				match => { "dmsg" => "%{PID144}" }
				add_field => { "eventid" => "PID144" }
				remove_tag => "unclassified"

		if [eventid] {
			translate {
				field => "eventid"
				destination => "eventtype"
				dictionary_path => "/data/eventtype.csv"
				fallback => "unknown"

Above works perfectly fine. I have added remaining 1000 grok pattern match filters like:

			grok { patterns_dir => ["/etc/logstash-indexer/patterns"]
				match => { "dmsg" => "%{PID1000}" }
				add_field => { "eventid" => "PID1000" }
				remove_tag => "unclassified"

I am stuck here!

Logstash takes 5-6 minutes to load configuration and then it gives error:

/opt/logstash/bin/logstash -f /etc/logstash-indexer/conf.d/ -t
java.lang.OutOfMemoryError: Java heap space
Dumping heap to /opt/logstash/heapdump.hprof ...
Unable to create /opt/logstash/heapdump.hprof: File exists
Error: Your application used more memory than the safety cap of 1G.
Specify -J-Xmx####m to increase it (#### = cap size in MB).
Specify -w for full OutOfMemoryError stack trace

I can increase memory but is this the best way to classify 1000+ events? Any suggestions with better approach?

Thank you.


Just capture the first token ("PIDxxx") into the eventid field and you only need a single grok filter. In other words, change the beginning of each grok pattern to e.g. %{WORD:eventid}. But I don't get why you need to match the whole strings at all. Why not just grab the event id from the string and call it a day? You don't appear to be extract stuff from the message anyway so I don't see how it matterns if "PID107" is followed by "Appliance Access Settings ..." or something else. Logstash is a log processor not a validator.