Match complete line after some regex pattern

ansamHox · September 16, 2021, 11:31am

I have lines in document like this.

*** Begin time:
Sat Jun 26 21:11:14 AEST 2019

Want to assign new field begin_time = Sat Jun 26 21:11:14 AEST 2019

This is what I tried:

if x =~ /(\bBegin time:\s+(.*))/
                    begin_time = x

return empty string

if x =~ /(\b?.:?Begin time:\s+(\S+.*))/
                    begin_time = x

also returns empty string. Trying it out in rubular, it works (mathes the new line content) - Rubular: \bBegin time:\s+(.*)

Cad · September 16, 2021, 3:51pm

Hi,

You can use grok for that

grok {
  match {
    "x" => "Begin time:\s+%{GREEDYDATA:begin_time}$"
  }
}

But you can't create and edit a field like you try to (with only an association).

Cad.

ansamHox · September 16, 2021, 9:05pm

Unfotunately, this block does not work (I keep getting errors), this is "expanded" code

filter {
        ruby {
          code => '
            lines = event.get("message").lines(chomp: true)
            begin_time = ""
            end_time = ""
              lines.each { |x|
                if x =~ /Begin time:)/
                    begin_time = x
                elsif x =~ /(End time)/
                    end_time = x
...

In the same file (log) I have begin and end time timestamps, and also couple of other variables.. and all of them works ok if they are on the same line, but cannot succeed to map value from next row, like in example.

ansamHox · September 17, 2021, 12:34pm

[2021-09-17T14:34:16,392][ERROR][logstash.agent ] Failed to execute action {:action=>LogStash::PipelineAction::Create/pipeline_id:main, :exception=>"LogStash::ConfigurationError", :message=>"Expected one of #, => at line 118, column 8 (byte 4435) after filter {\n\truby {\n..

leandrojmp · September 17, 2021, 12:42pm

This is a pipeline configuration error, something is missing, maybe a curly bracket or double quotes was not closed.

It says where the error is: Message=>"Expected one of #, => at line 118, column 8

ansamHox · September 17, 2021, 12:48pm

I know, but it appears when i add grok in the filter, otherwise it does not..

ansamHox · September 17, 2021, 2:31pm

This is the code, I don't really see anything wrong..

input {
    file {
        path => "/etc/logstash/files/*"
	    codec => multiline {
                pattern => "^$"
                negate => true
                what => next
	        max_lines => 14000
	        auto_flush_interval => 5
            }
        start_position => "beginning"
        sincedb_path => "/dev/null"
    }
}

filter {
	ruby {
          code => '
            lines = event.get("message").lines(chomp: true)
	    begin_time = ""
	    end_time = ""
            logData = ""
            user = ""
            lines.each { |x|
		if x =~ /(Begin time)/
                    begin_time = x
	        elsif x =~ /(End time)/
                    end_time = x
                    #end_time = end_time * ""
		elsif x =~ /^(USER)/
		    user = x.scan(/(?:.?USER=)(.*)/ )[0]
		    user = user * ""
		 else
                    unless x =~ /^(\s|sending|total|rsl|sent|\[)/
                        logData += x + ","
                    end
                end
            }
	    event.set("begin_time", begin_time)
            event.set("end_time", end_time)
            event.set("logData", logData)
            event.set("user", user)
        '
    }
       
// I PUT GROK HERE AND GET AN ERROR
// grok {
// match {
//    "x" => "Begin time:\s+%{GREEDYDATA:begin_time}$"
//  }
//}

       mutate {
          remove_field => ["tags", "message", "@version", "@timestamp", "host", "path"]
    }
}

output{
    elasticsearch {
        hosts => ["XXXX"]
	index => "log_logs"
 }
  stdout {
        codec => rubydebug
   }
}

ansamHox · September 17, 2021, 3:37pm

I managed to do something and grok is now working, but since my file has multiple lines after "begin time", it saves under "begin time" field all the rest of the document, how can I make him to store only that one (next) line where the date is?

Thank you, now it looks like:

"begin_time" : """
Sat Jun 26 16:56:16 AEST 2021
rm: cannot remove '/scrXXXX/003': No such file or directory
IDS=4947802324992
ENV=BATCH
LD_LIBRARY_PATH=/apps/ncl/6.6.2/l
...

Badger · September 17, 2021, 9:02pm

That's not a line, it is two lines. You use a multiline codec to combine multiple lines into a single [message] field, but then you are using

lines = event.get("message").lines(chomp: true)

in order to process the lines one at a time. You could try something like

message = event.get("message")
mdata = message.match(/Begin Time:[^\n]*\n([^\n]*)\n/)
if mdata
    event.set("begin_time", mdata[0])
end

ansamHox · September 17, 2021, 9:57pm

I'm using chomp true because I'm extracting 30 more fields which are in the same line ( field: value, eg.). Ony begin_time and end_time has value in new line.

I added new message variable and tried out this match, and the result is:

"start_time" : """
Starting time:
Sat Jun 26 16:56:16 AEST 2021
"""

Looks like it should be event.set("begin_time", mdata[1])

Thanks

system · October 15, 2021, 9:57pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Grok matching multi-lines saving first and last value to separate fields Logstash	12	2092	March 22, 2021
Multiline log Logstash	6	919	July 6, 2017
How to match string in grok Logstash	5	4974	April 15, 2017
Grok pattern to match Filebeat multiline input up to the first new line character Logstash	6	1100	November 26, 2020
Grok not breaking on matching first pattern Logstash	1	370	October 15, 2018

Match complete line after some regex pattern

Related topics