Is it possible to use split filter instead of grok in Logstash to chop message?

Hi,

I am using ELK GA 6.3. I am using Logstash to read data from Kafka. My Kafka message is like;

<Jun 02, 2018 12:04:41:531 AM> <data1> <data2> <data3\n>

I am using the below grok pattern to split data;

grok {
	match => { "message" => "<%{GREEDYDATA:timestamp}> <%{GREEDYDATA:data1}> <%{GREEDYDATA:data2}> <%{GREEDYDATA:data3}>" }
}

is it possible to achieve the same using split filter?

Thank you.

The split filter splits a single message into multiple messages so it's not equivalent to what you get with a grok filter.

Don't use multiple GREEEDYDATA like that. It's very ineffecient and could easily match things incorrectly.

@magnusbaeck why I am using GREEEDYDATA there is because;

  1. GREEDYDATA .*. Greedydata matches everything. I want everything inside < and >, and I dont want to perform validation. I believe that .* takes less effort compared to others.
  2. My entire message is inside <tags> so that I have a start point < and end point > for messages.

Is this okay / still inefficient?

As you have well defined field separators, the dissect filter might be a good and efficient option here.

It's still inefficient since it's greedy and will first attempt to stuff <data1> <data2> <data3> into the data1 field, but then it discovers that there's no text left for the two GREEDYDATA pattern to match against, so it backtracks and tries to match <data1> <data2> but then there's still one GREEDYDATA that doesn't get anything so... you get the idea.

Using DATA should be much more efficient, but I'd still expect it to be outperformed by (?<data1>[^>]+).

2 Likes

Oh :open_mouth: ok ok.. now i understand..

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.