How to extract things from beginning and end of a long single line log not parsing or extracting the thing in the middle of log


(Metricbeat) #1

Hi,

Suppose that I have a single line log. This log has like 500 words. Suppose that I'm interested in extracting the initial 5 words and the last 5 words and I'm not interested on parsing the middle.

Is there a way to do this?

I'm trying to save the compute power that i would be spending by filtering too many words which are in the middle but which I'm not interested on parsing.

Thanks
Luis


#2

I would use a pair of grok patterns, one anchored with ^ (start of line) and one with $ (end of line).


(Magnus B├Ąck) #3

I would use a pair of grok patterns, one anchored with ^ (start of line) and one with $ (end of line).

Is that more efficient than having a single expression with .* in the middle?


#5

Definitely not. But if I think I am logically doing two different things (grabbing words from the beginning and grabbing words from the end) then I would still be tempted to use two patterns.

If I have a kafka topic that sends 20,000 identical lines each of which contains 500 words, then I can read it in two pipelines, one of which does

grok {
    match => { "message" => [ "^%{WORD:a} %{WORD:b} %{WORD:c} %{WORD:d} %{WORD:e}.*%{WORD:v} %{WORD:w} %{WORD:x} %{WORD:y} %{WORD:z}$" ] }
    id => "grok with 1 pattern"
}

and another that does

grok {
    match => { "message" => [ "^%{WORD:a} %{WORD:b} %{WORD:c} %{WORD:d} %{WORD:e}", "%{WORD:v} %{WORD:w} %{WORD:x} %{WORD:y} %{WORD:z}$" ] }
    break_on_match => false
    id => "grok with 2 patterns"
}

Then I can look at the pipeline stats and check duration_in_millis. I see 17088 when I use 1 pattern vs. 56211 for 2 patterns.


(Metricbeat) #6

Thanks for the pointers!


(system) #7

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.