Suppose that I have a single line log. This log has like 500 words. Suppose that I'm interested in extracting the initial 5 words and the last 5 words and I'm not interested on parsing the middle.
Is there a way to do this?
I'm trying to save the compute power that i would be spending by filtering too many words which are in the middle but which I'm not interested on parsing.
Definitely not. But if I think I am logically doing two different things (grabbing words from the beginning and grabbing words from the end) then I would still be tempted to use two patterns.
If I have a kafka topic that sends 20,000 identical lines each of which contains 500 words, then I can read it in two pipelines, one of which does
grok {
match => { "message" => [ "^%{WORD:a} %{WORD:b} %{WORD:c} %{WORD:d} %{WORD:e}.*%{WORD:v} %{WORD:w} %{WORD:x} %{WORD:y} %{WORD:z}$" ] }
id => "grok with 1 pattern"
}
and another that does
grok {
match => { "message" => [ "^%{WORD:a} %{WORD:b} %{WORD:c} %{WORD:d} %{WORD:e}", "%{WORD:v} %{WORD:w} %{WORD:x} %{WORD:y} %{WORD:z}$" ] }
break_on_match => false
id => "grok with 2 patterns"
}
Then I can look at the pipeline stats and check duration_in_millis. I see 17088 when I use 1 pattern vs. 56211 for 2 patterns.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.