Logstash and Grok Matching

Hi

My question is regarding Logstash and Grok.

If I have two filters in two different files

filter {
if [message] =~ Regex
grok {
match => [PATTERNA]
}

filter {
if [message] =~ Regex
grok {
match => [PATTERNB]
}

If a messages matches both the regex for filter 1 and filter 2. But it fails to match the grok on PATTERNA, will it fail and exit here, or will it try PATTERNB in the second file?

Your question is convoluted and I believe dont understand it properly.

Why not to have one config file with two different regex patterns?
Why you need two config files for ingesting the same data?

Well in terms of performance what is better?

A single config file with 100 grok filters to match against.

Or four with 25 grok filters each?

Well, you can use the break_on_match clause in grok to stop processing your regular expressions in case of the 100% match. https://www.elastic.co/guide/en/logstash/current/plugins-filters-grok.html

If you do not have a insane EPS just do this with one config, make sure that you have proper anchoring ^ and $ in your patterns to do the efficient matching. You can sort your match expressions in the ordered list that the most matching ones will be placed on the top etc.

If you have insane amount of events per second you will get better performance by distributing your input.

Hi,

Thanks for the response. We have high EPS, so I think that it's the best methodology to split the filters as we are currently doing anyways.

The difficulty arises when regex matches two types of log message.

For example Syslog regex, could match against a Firewall, but also against an Ubuntu box.

Here we will need to split them into the two filters. I guess I just need to invest some time into working out some complicated logic.

Regards,

Jason

You can try the following idea for pipeline to pipeline than:

input {
	file {
	        path => "/your/file/log1.log"
	        start_position => "beginning"
	        codec => json
	}
}

filter {
	grok {
		match { "message" => }#do some quick regex prefiltering in order to detect what is comming from where and distribute later
	}
}

output {
	if [log1_field]{
		pipeline { 
			id => "YOUR_LOG1_PROCESSING_PIPELINE"
			send_to => LOG1_PROCESSING
		}
	} else if [log2_field]{
		pipeline {
			id => "YOUR_LOG2_PROCESSING_PIPELINE"
			send_to => LOG2_PROCESSING
		}
	} else {
		pipeline {
			id => "YOUR_LOG3_PROCESSING_PIPELINE"
			send_to => LOG3_PROCESSING
		}
	}
}

You do the very light regex expressions in the distributor and your heavy processing on separate pipelines. I do not know any other idea than this or running kawka and multiple logstash nodes reading from the same topics and doing that loadbalancing.

@pastechecker

Where did you place that configuration?

In the pipelines.yml file?

No.
pipelines.yml is the file that contains the information where you should load your config from, how many workers, batch sizes etc.

In your example you would have to have something similar to:

pipelines.yml:

- pipeline.id: distributor_pipeline.conf
  path.config: "/etc/logstash/conf.d/distributor_pipeline.conf"
- pipeline.id: file_processing_1.conf
  path.config: "/etc/logstash/conf.d/file_processing_1.conf"
  - pipeline.id: file_processing_2.conf
  path.config: "/etc/logstash/conf.d/file_processing_2.conf"
  - pipeline.id: file_processing_3.conf
  path.config: "/etc/logstash/conf.d/file_processing_3.conf"

Your distributor pipeline would be the config pasted above named as distributor_pipeline.conf

The beginning of the file_processing_* configs would be:

input { 
	pipeline { 
		address => LOG1_PROCESSING 
	} 
}
filter {
#do your detailed stuff here
}

output {
#output wherever you want
}

Is it clearer now?

Ignore.

Look above.
Also a good reading: https://www.elastic.co/guide/en/logstash/current/pipeline-to-pipeline.html

@pastechecker

Sorry. Thanks, that makes much more sense..

I'll give it a go :slight_smile:

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.