Help needed with multiline codec, current pattern too slow


(Nikhil S) #1

Hello, for these set of input lines:

SAMPLE INPUT

INFO   | jvm 1    | 2017/08/28 17:54:14 | 25306.601: Total time for which application threads were stopped: 0.0008632 seconds, Stopping threads took: 0.0000905 seconds
INFO   | jvm 1    | 2017/08/28 17:54:17 | Full thread dump Java HotSpot(TM) 64-Bit Server VM (25.141-b15 mixed mode):
INFO   | jvm 1    | 2017/08/28 17:54:17 | 
INFO   | jvm 1    | 2017/08/28 17:54:17 | "something-1" something
INFO   | jvm 1    | 2017/08/28 17:54:17 |    java.lang.Thread.State: TIMED_WAITING (on object monitor)
INFO   | jvm 1    | 2017/08/28 17:54:17 | 	at something
INFO   | jvm 1    | 2017/08/28 17:54:17 | 	at something
INFO   | jvm 1    | 2017/08/28 17:54:17 | 	- locked something
INFO   | jvm 1    | 2017/08/28 17:54:17 | 
INFO   | jvm 1    | 2017/08/28 17:54:17 | Exception in thread "something-2" something
INFO   | jvm 1    | 2017/08/28 17:54:17 | 	at something
INFO   | jvm 1    | 2017/08/28 17:54:17 | 	at something
INFO   | jvm 1    | 2017/08/28 17:54:17 | 	at something
INFO   | jvm 1    | 2017/08/28 17:54:17 | 	at something
INFO   | jvm 1    | 2017/08/28 17:54:17 | Caused by: something
INFO   | jvm 1    | 2017/08/28 17:54:17 | 	at something
INFO   | jvm 1    | 2017/08/28 17:54:17 | 	at something
INFO   | jvm 1    | 2017/08/28 17:54:17 | 	at something
INFO   | jvm 1    | 2017/08/28 17:54:17 | 	at something
INFO   | jvm 1    | 2017/08/28 17:54:17 | 	... 12 more
INFO   | jvm 1    | 2017/08/28 17:54:17	| Caused by: something
INFO   | jvm 1    | 2017/08/28 17:54:17 | 	at something
INFO   | jvm 1    | 2017/08/28 17:54:17 | 	at something
INFO   | jvm 1    | 2017/08/28 17:54:17 | 	at something
INFO   | jvm 1    | 2017/08/28 17:54:17 | 	at something
INFO   | jvm 1    | 2017/08/28 17:54:17	|	... 15 more
INFO   | jvm 1    | 2017/08/28 17:54:17 | 
INFO   | jvm 1    | 2017/06/20 19:58:39 | Heap
INFO   | jvm 1    | 2017/06/20 19:58:39 |  PSYoungGen      total 2540032K, used 935016K [0x0000000759080000, 0x00000007fff80000, 0x0000000800000000)
INFO   | jvm 1    | 2017/06/20 19:58:39 |   eden space 2342400K, 36% used [0x0000000759080000,0x000000078d9adef8,0x00000007e8000000)
INFO   | jvm 1    | 2017/06/20 19:58:39 |   from space 197632K, 37% used [0x00000007f3e80000,0x00000007f866c470,0x00000007fff80000)
INFO   | jvm 1    | 2017/06/20 19:58:39 |   to   space 195072K, 0% used [0x00000007e8000000,0x00000007e8000000,0x00000007f3e80000)
INFO   | jvm 1    | 2017/06/20 19:58:39 |  ParOldGen       total 5470208K, used 1553638K [0x000000060b200000, 0x0000000759000000, 0x0000000759080000)
INFO   | jvm 1    | 2017/06/20 19:58:39 |   object space 5470208K, 28% used [0x000000060b200000,0x0000000669f399e8,0x0000000759000000)
INFO   | jvm 1    | 2017/06/20 19:58:39 |  PSPermGen       total 230400K, used 230349K [0x00000005eb200000, 0x00000005f9300000, 0x000000060b200000)
INFO   | jvm 1    | 2017/06/20 19:58:39 |   object space 230400K, 99% used [0x00000005eb200000,0x00000005f92f3650,0x00000005f9300000)
INFO   | jvm 1    | 2017/06/20 19:58:39 | 
INFO   | jvm 1    | 2017/06/20 19:58:47 | 135695.454: Total time for which application threads were stopped: 0.0026100 seconds, Stopping threads took: 0.0001660 seconds
INFO   | jvm 1    | 2017/06/20 19:58:52 | Returning empty Certificate from getAcceptedIssuers

From this, I need following blocks as output:

  1. 25306.601: Total time .... (and similar single lines starting with a number)
  2. Full thread dump Java....
  3. the whole something-1 block
  4. the whole something-2 block, including 'Caused by' blocks
  5. Heap block
  6. Returning empty Certificate..... (and similar single lines)

Currently, I'm using this in my input:

input {
file {
type => "sometype"
path => "somepath"
start_position => "beginning"
#ignore_older => 0
sincedb_path => "NUL"
#Forces to re-parse
codec => multiline {
pattern => "(?m)(%{DATA}|%{DATA}|%{DATA}| (( )|(\t))+)|(%{DATA}|%{DATA}|%{DATA}| Caused)"
what => "previous"
auto_flush_interval => 2
}
}
}

The output is then fed to elasticsearch.

My issue is the processing rate is very slow. It is like 4-5 docs as output, per minute.
I usually get 2000-2500 docs/s as output, for some other type of log files.

I thought that maybe multiline codec is the one that is causing such drastic performance drop.

Please help.
Thank you.


(Nikhil S) #2

Sorry due to max character limit, couldn't post my filter section. Here it is:
filter {

if [type] == "sometype" {

	
	if "prio" in [message]
	{
		if ("multiline" in [tags]) {
			grok {
				match => {
					"message" => "(?m)%{DATA}\| %{YEAR:year}/%{MONTHNUM2:month}/%{MONTHDAY:day} %{TIME:time} \|%{DATA}nid=%{BASE16NUM:nid_hex}%{DATA}java.lang.Thread.State: %{DATA:thread_state}((\n)|(\r))%{DATA}"
				}
			}
		}
		else {
			grok {
				match => {
					"message" => "(?m)%{DATA}\| %{YEAR:year}/%{MONTHNUM2:month}/%{MONTHDAY:day} %{TIME:time} \|%{DATA}nid=%{BASE16NUM:nid_hex}%{DATA}"
				}
			}
		}
		ruby { 
			code => "event.set('nid_int', event.get('nid_hex').hex)" 
		}
		mutate {
			add_field => {
				"timestamp" => "%{year}-%{month}-%{day} %{time}"
			}
		}
		date {
			match => ["timestamp", "YYYY-MM-dd HH:mm:ss"]
			remove_field => ["timestamp","year","month","day","time"]
			#locale => "en"
			# timezone => "Asia/Kolkata"
			timezone => "Etc/GMT"			
		}
    }
	else if "Exception" in [message] {
		# EXCEPTION LINES
		# GET EXCEPTIONS
		if "Caused by" in [message] {
			# trace
			grok {
				# break_on_match => "false"
				match => { "message" => "(?m)%{DATA}\| %{YEAR:year}/%{MONTHNUM2:month}/%{MONTHDAY:day} %{TIME:time} \|%{DATA}\.(?<exception>[a-zA-Z]+)Exception:%{DATA}Caused by:%{DATA}\.(?<caused_by>[a-zA-Z]+)Exception%{GREEDYDATA}" }
				add_field => {
					"timestamp" => "%{year}-%{month}-%{day} %{time}"
				}
			}
		}
		else if "Exception" in [message] {
			grok {
				match => { "message" => "(?m)%{DATA}\| %{YEAR:year}/%{MONTHNUM2:month}/%{MONTHDAY:day} %{TIME:time} \|%{DATA}\.(?<exception>[a-zA-Z]+)Exception:%{GREEDYDATA}" }
				add_field => {
					"timestamp" => "%{year}-%{month}-%{day} %{time}"
				}
			}
		}
		date {
			match => ["timestamp", "YYYY-MM-dd HH:mm:ss"]
			remove_field => ["timestamp","year","month","day","time"]
			#locale => "en"
			timezone => "Etc/GMT"		
		}
	}
	else 
	{
		#non threaddump lines
		#drop {}
		grok {
			match => {
				"message" => "%{DATA:loglevel}%{SPACE}\|%{SPACE}%{DATA:field}%{SPACE}\| %{YEAR:year}/%{MONTHNUM2:month}/%{MONTHDAY:day} %{TIME:time} \| %{GREEDYDATA:details}"
			}
		}
		mutate {
			add_field => {
				"timestamp" => "%{year}-%{month}-%{day} %{time}"
			}
		}
		date {
			match => ["timestamp", "YYYY-MM-dd HH:mm:ss"]
			remove_field => ["timestamp","year","month","day","time","message"]
			#locale => "en"
			# timezone => "Asia/Kolkata"
			timezone => "Etc/GMT"			
		}
		if ("Total time for which application threads were stopped" in [details]) {
			#drop {}
			grok {
					match => { "details" => "%{NUMBER:collection_time:float}: Total time for which application threads were stopped: %{NUMBER:app_thread_stopped:float} seconds, Stopping threads took: %{NUMBER:stop_threads:float} seconds"
				}
			}
		}
		else if("PSYoungGen" in [details] and "ParOldGen" not in [details])
		{
			# drop {}
			grok { 							#-- [PSYoungGen: 2538481K->2538481K(2538496K)] 7869259K->8009199K(8009216K), 0.8418280 secs] [Times: user=1.67 sys=0.47, real=0.84 secs]
					match => { "details" => "%{DATA}\[PSYoungGen: %{NUMBER:young_gen_before:int}K->%{NUMBER:young_gen_after:int}K\(%{NUMBER:young_gen_total:int}K\)\] %{NUMBER:overall_before:int}K->%{NUMBER:overall_after:int}K\(%{NUMBER:overall_total:int}K\), %{NUMBER:gc_time:float} secs\] \[Times: user=%{NUMBER:user_time:float} sys=%{NUMBER:sys_time:float}, real=%{NUMBER:real_time:float} secs\]"
				}
			}
			mutate {
				add_field => {
					"linetype" => "gc"
				}
			}
		}
		else if ("ParOldGen" in [details]){
			# drop {}
			grok { 							# [PSYoungGen: 83718K->0K(2549248K)] [ParOldGen: 5400973K->1594715K(5470208K)] 5484691K->1594715K(8019456K) [PSPermGen: 230277K->230277K(230400K)], 2.4420490 secs] [Times: user=7.85 sys=0.00, real=2.44 secs] 
					match => { "details" => "%{SPACE}\[PSYoungGen: %{NUMBER:young_gen_before:int}K->%{NUMBER:young_gen_after:int}K\(%{NUMBER:young_gen_total:int}K\)\] \[ParOldGen: %{NUMBER:old_gen_before:int}K->%{NUMBER:old_gen_after:int}K\(%{NUMBER:old_gen_total:int}K\)\] %{NUMBER:overall_before:int}K->%{NUMBER:overall_after:int}K\(%{NUMBER:overall_total:int}K\) \[PSPermGen: %{NUMBER:perm_gen_before:int}K->%{NUMBER:perm_gen_after:int}K\(%{NUMBER:perm_gen_total:int}K\)\], %{NUMBER:gc_time:float} secs\] \[Times: user=%{NUMBER:user_time:float} sys=%{NUMBER:sys_time:float}, real=%{NUMBER:real_time:float} secs\]"
				}
			}
			mutate {
				add_field => {
					"linetype" => "fullgc"
				}
			}
		}
		else if ("Cannot allocate memory" in [details]) {
			mutate {
				add_field => {
					"linetype" => "outofmemory"
				}
			}
		}
		else {
			drop {}
		}
	}
}	

}


(Nikhil S) #3

bump?


(Christian Dahlqvist) #4

You are using a lot of DATA patterns, which can be expensive to process. Try replacing these with as specific patterns as possible and see how that affects throughput.


(Nikhil S) #5

Thanks, will try it and let you know.


(system) #6

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.