How to extract rows with logstash

citanionarrei · November 2, 2017, 6:32pm

hi
i have this log file

I want to extract only the lines where the field INFO appears

i'm using this configuration on logstash

input{
file  { 	  path => "C:\Users\Lock\Desktop\info.3"
start_position => "beginning"	
  sincedb_path => "/dev/null"
  ignore_older => 0
}
} 

filter {
grok {		 
	match => ["message", "(?<info>^$.*\| INFO  \|.*\n)"]
	break_on_match => true
	add_field => { "type" => "log_info"} 		
}
}

output {

file {
  path => "C:\Users\Lock\Desktop\Output_info.log"
} 
		
elasticsearch {
index => "info-%{+YYYY.MM.dd}"
document_type => "log_info"

}
stdout {codec=> rubydebug}

}

but this conf doesn't work becouse it take all rows !!

Badger · November 2, 2017, 7:49pm

Replace

match => ["message", "(?<info>^$.*\| INFO  \|.*\n)"]

with

match => ["message", "(?<info>.*\| INFO \|.*)"]

(single space, not two spaces after INFO). Then you can

if ! ( "" in [info] ) {drop{}}

citanionarrei · November 2, 2017, 8:53pm

ok perfect many thanks...now it work
but there are two spaces after INFO
right,Now I have this

{"path":"C:\\Users\\Lock\\Desktop\\as.3","@timestamp":"2017-11-02T20:13:03.325Z","@version":"1","host":"Lock","message":"2017-06-08 12:10:12,906 | INFO  | nager-plugin-out |

ok, now I would like to delete fields of my path, my timestamp, host..
Than i thinked to use "mutate filter" but....
when i use it for the host field and for the path field it work properly,
but if i use it also for @timestamp and @version fields , system give me an error

input{
file 
  { 	
 path => "C:\Users\Lock\Desktop\as.3"
   start_position => "beginning"	
   sincedb_path => "/dev/null"
 ignore_older => 0
 }
} 

filter {
grok {		 
	match => ["message", "(?<info>.*\| INFO  \|.*)"]
	break_on_match => true
	add_field => {"type" => "log_info"}
	 }
	
if !( "" in [info]){drop{}}

mutate {remove_field => ["path","host"] }
#mutate {remove_field => ["path","@timestamp","@version","host"] }# if i use this row i have errors
	}


output {

file {  path => "C:\Users\Lock\Desktop\Output_as.log"	} 
		
elasticsearch {
index => "error-%{+YYYY.MM.dd}"
document_type => "log_info"
			  }
stdout {codec=> rubydebug}

}

where is the error?
where I'm wronging

Badger · November 3, 2017, 3:32pm

What version are you running and what error message do you get?

In V6.0.0-rc1, which is what I am running, using remove_field to remove @version and @timestamp works just fine. It may be a limitation in an earlier release.

Not sure why you would not want a timestamp. If you want to set the timestamp to the time in the message you can do that using

filter {
  grok {
    match => ["message", "(?<ts>[^|]*) \| %{WORD:level}%{SPACE} \| .*"]
  }
  if "DEBUG" == [level] {drop{}}
  date {
    match => [ "ts", "yyyy-mm-DD HH:mm:ss,SSS" ]
    timezone => "Europe/Rome"
  }
  mutate {remove_field => [ "path", "host", "ts" ] }
}

citanionarrei · November 3, 2017, 3:41pm

hi many thanks ...
but i dont understand perfectly how work this line

ps: i don't want @timestamp becouse it is timestamp of my operation and i want only timestamp of the my line log

citanionarrei · November 3, 2017, 4:40pm

your grok filter is ok...thanks!!!

my version is 5.5.3
the error is

[2017-11-03T17:37:53,610][FATAL][logstash.runner          ] An unexpected error occurred! {:error=>#<LogStash::Error: timestamp field is missing>, :backtrace=>["org/logstash/ext/JrubyEventExtLibrary.java:202:in `sprintf'", "C:/Users/Lock/Desktop/logstash/vendor/bundle/jruby/1.9/gems/logstash-output-elasticsearch-7.3.7-java/lib/logstash/outputs/elasticsearch/common.rb:173:in `event_action_params'", "C:/Users/Lock/Desktop/logstash/vendor/bundle/jruby/1.9/gems/logstash-output-elasticsearch-7.3.7-java/lib/logstash/outputs/elasticsearch/common.rb:48:in `event_action_tuple'", "C:/Users/Lock/Desktop/logstash/vendor/bundle/jruby/1.9/gems/logstash-output-elasticsearch-7.3.7-java/lib/logstash/outputs/elasticsearch/common.rb:42:in `multi_receive'", "org/jruby/RubyArray.java:2414:in `map'", "C:/Users/Lock/Desktop/logstash/vendor/bundle/jruby/1.9/gems/logstash-output-elasticsearch-7.3.7-java/lib/logstash/outputs/elasticsearch/common.rb:42:in `multi_receive'", "C:/Users/Lock/Desktop/logstash/logstash-core/lib/logstash/output_delegator_strategies/shared.rb:13:in `multi_receive'", "C:/Users/Lock/Desktop/logstash/logstash-core/lib/logstash/output_delegator.rb:47:in `multi_receive'", "C:/Users/Lock/Desktop/logstash/logstash-core/lib/logstash/pipeline.rb:420:in `output_batch'", "org/jruby/RubyHash.java:1342:in `each'", "C:/Users/Lock/Desktop/logstash/logstash-core/lib/logstash/pipeline.rb:419:in `output_batch'", "C:/Users/Lock/Desktop/logstash/logstash-core/lib/logstash/pipeline.rb:365:in `worker_loop'", "C:/Users/Lock/Desktop/logstash/logstash-core/lib/logstash/pipeline.rb:330:in `start_workers'"]}

Badger · November 3, 2017, 4:47pm

I see. It worked for me because I am only using a stdout / rubydebug output. So the remove_field mutation worked just fine. The problem is the elasticsearch output requires every event to have a timestamp.

citanionarrei · November 4, 2017, 10:44am

hi friends
i have another problem
i want estract from this rows only where INFO and nager-plugin-out are present

2017-06-04 07:22:39,307 | INFO  | ActiveMQ Task-1  | FailoverTransport                | 128 - 
2017-06-05 09:04:36,082 | INFO  | nager-plugin-out | ManagedManagementStrategy        | 
2017-06-05 09:04:36,086 | INFO  | nager-plugin-out | DefaultTypeConverter             | 99 - 
2017-06-05 09:04:36,089 | INFO  | nager-plugin-out | DefaultRuntimeEndpointRegistry   |  
2017-06-05 09:04:36,090 | INFO  | nager-plugin-out | DefaultCamelContext              | 99 -

i try to use this configuration but grok don't match

 filter {
 grok {
match => ["message", "(?<ts>[^|]*) \| %{WORD:level}%{SPACE} \|.*| %{WORD:level1}%{SPACE} \|.*"]

 }
 if (  "INFO" != [level]   AND  "nager-plugin-out"  != [level1]  ) {drop{}}
 date {
    match => [ "ts", "yyyy-mm-DD HH:mm:ss,SSS" ]
    timezone => "Europe/Rome"
   }

where am i wrong?

Badger · November 4, 2017, 12:13pm

There are four issues.

Firstly, boolean operators have to be lowercase. You should see a message like 'Expected one of #, and, or, xor, nand, ) at line 16, column 29 (byte 314) after filter ...'. So if you fold that AND down to and the configuration will at least compile,

Secondly, you want the condition to be or, not and.

Thirdly, you have an unescaped | in your regex, which is used for alternation (pattern1 or pattern2), and an extra .*.

Fourthly, you have used WORD, which in my locale does not allow the word to contain -. It is a sequence of \w, which for me is [A-Za-z0-9_]

Putting it all together, try this:

  grok {
    match => ["message", "(?<ts>[^|]*) \| %{WORD:level}%{SPACE} \| (?<msg>[^|]*)%{SPACE} \|.*"]
  }
  if ( "INFO" != [level] or "nager-plugin-out" != [msg] ) {drop{}}

citanionarrei · November 6, 2017, 12:40pm

ok good...many thanks for help
i have another problem

2017-07-13 13:32:32,562 | WARN  | ...other words.. {"request":{"cks":31155,"terminal"....}

I can extract a line where there is a WARM but I also want to extract and process the Json that is there after {"request":{"cks":31155,"terminal"........}
I'm trying with this conf.

filter {
grok {		 
	match => ["message", "(?<check>).*\| WARN  \|.*(?<check_json>){\"request\".*}"]
	break_on_match => true
	add_field => {"type" => "log_warn"}
    }
if !( "" in [check]){drop{}}

mutate {  remove_field => ["path","host","check"]
      add_tag => [ "WARNING" ]	}
	
	
if "_grokparsefailure" in [tags] {drop {}}	
else {
	json {
		source => "check_json"
	}}
}

what do you think?

citanionarrei · November 6, 2017, 1:09pm

ok it work in this way

grok {		 
match => ["message", "(?<check>.*\| WARN  \|.*)(?<check_json>{\"request\".*})"]
break_on_match => true
add_field => {"type" => "log_warn"}
}
if !( "" in [check]){drop{}}

mutate {  remove_field => ["path","host","check_json"]
  add_tag => [ "WARNING" ]	}

if i want send a mail with body with "message" and the "json" how i can make it?

email {
			 codec          =>    "plain"
			 contenttype    =>    "text/html; charset=UTF-8"

             ............

			 via            =>    "smtp"
			 body           =>    "%{message}",%{check_json}

in this way i see correctly only "message"

Badger · November 6, 2017, 6:06pm

You used mutate/remove_field to delete check_json, so by the time it gets to the output filter it no longer exists.

system · December 4, 2017, 6:06pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Extract rows from log using http plugin Logstash	6	635	December 20, 2017
Extracting specific lines of a text using ruby Logstash	2	448	May 1, 2019
Logstash Extract message field .log file Logstash	1	341	June 30, 2021
Logstash output file has additional fields Logstash	4	1275	July 6, 2017
Read from Logstash file and format Logstash	2	227	January 9, 2023

How to extract rows with logstash

Related topics