Why not always use %{DATA} as the semantic in the initial grok match?

iamthealex · April 22, 2016, 2:34pm

From the grok page, the suggested matching for log entries might look like this.

%{IP:client} %{WORD:method} %{URIPATHPARAM:request} %{NUMBER:bytes} %{NUMBER:duration}

In other words, the suggestion is to use syntax matches as often as possible.

However, what if I also have a field called true-client-ip that may contain garbage or may contain a real ip.
I don't want my grok parse to fail if the value in the true-client-ip field does not look like an ip.

So, I'm tempted to use %{DATA} for almost all my fields, and then to add extra decoration if I can grok the field using the hoped-for syntax.

For example, I am proposing that I have an initial grok that use %{DATA} to avoid grok parse failures, and then a second grok filter that tries to match the value of the true-client-ip field and on a successful match would add a new field like valid-true-client ip.

filter {
grok {
match => { "true-client-ip" => "%{IP:valid-true-client-ip" }
}
}

warkolm · April 24, 2016, 12:00am

Your proposed solution will work given the data content. There's definitely no reason not to use it!

magnusbaeck · April 24, 2016, 5:21pm

So, I'm tempted to use %{DATA} for almost all my fields, and then to add extra decoration if I can grok the field using the hoped-for syntax.

The DATA pattern matches any character so you might get surprised by the results. I've seen a number of cases where people have used more than one DATA or GREEDYDATA pattern in the same expression and for some types of inputs get really weird results since either pattern matches too much.

In this particular case I'd use NOTSPACE instead of DATA , at least if the garbage IP address doesn't contain spaces. That'll also perform better than having a DATA pattern that the regexp process might need to backtrack from.

iamthealex · April 25, 2016, 3:20pm

Ohh, nice! I'll use NOTSPACE instead of DATA and gain both performance and better discrimination. Thanks.

Topic		Replies	Views
Grok result can not find my data field Logstash	3	370	July 6, 2017
Grok pattern from within kibana dev tools Kibana	5	984	November 9, 2017
Grok filter throwing error for valid pattern Logstash	5	867	April 22, 2018
Logstash grok parser Logstash	1	349	September 20, 2018
Grok is not parsing GREEDYDATA field Logstash	6	8289	July 6, 2017

Why not always use %{DATA} as the semantic in the initial grok match?

Related topics