How to match lines in unstructured log starting with specific string

ansamHox · September 13, 2021, 5:50pm

I have a very big unstructured file. Firstly, I want to parse all lines that starts with ABC or CDE and store them as one document in Elasticsearch. One file should be one document in index, so @message should look like all ABC + CDE lines.

Secondly, I also have "ignore list", like lines starting with empty space, "Timing", "----" etc., and basicaly ALL that is left ater parse is complete, I want to store in new key value (data). Final result should look something like:

 "_source" : {
          "message" : "ABC V4.1.2 MODEL,CDE: 0 uri: xxxx.xxx"
          "data" : "ALL LINES THAT ARE NOT IN IGNORE LIST"
 }

This is my logstash conf file for the first part, but for some reason it is not processing anything because index is empty, and I can't see any error in logs, probably because I'm not saving it properly.

input {
    file {
        path => "/etc/logstash/files/*"
        start_position => "beginning"
        sincedb_path => "/dev/null"
    }
}

filter {

  if ([message] !~ "^ABC"){
   drop{}
  }
  else if ([message] !~ "^CDE"){
    drop{}
  }
}

output{
    elasticsearch {
        hosts => ["XXX"]
        index => "index1"
 }
}

When I add 1 ef expression it works, but when I add second one , it doesn't process anything. Any help? Thank you

Badger · September 13, 2021, 6:38pm

If you want to combine all the lines from one file into a single document then you could do it using an aggregate filter, but I would use a multiline codec to read the entire file as a single event as described here.

I would then do the processing with a ruby filter.

    ruby {
        code => '
            lines = event.get("message").lines(chomp: true)
            newMessage = ""
            theRest = ""
            lines.each { |x|
                if x =~ /^(ABC|CDE)/
                    newMessage += x + ","
                else
                    unless x =~ /^(\s|Timing)/
                        theRest += x + ","
                    end
                end
            }
            event.set("message", newMessage)
            event.set("data", theRest)
        '
    }

Obviously you will want to tune those regular expressions. With the file you showed that will get you

      "data" => "Quilting with   1 groups of   0 I/O tasks.,DYNAMICS OPTION: Eulerian Mass Coordinate,",
   "message" => "ABC V4.1.2 MODEL,ABC restart, LBC starts at 1979-12-19_00:00:00 and restart starts at 1979-12-19_00:00:00,CDE: 0 hostname: xxxx.xxx,"

ansamHox · September 13, 2021, 7:14pm

Thank you Badger, as usual

Badger · September 13, 2021, 7:18pm

Sounds like you are not using the multiline codec correctly. Note that you may need to configure max_lines and max_bytes if the files are tens of thousands of lines long.

system · October 11, 2021, 7:31pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Parsing lines that only match a start string Logstash	4	2343	February 24, 2020
How to let Logstash start indexing into Elasticsearch on specific Word Logstash	1	386	April 3, 2019
How to parse single line for different outputs Logstash	10	1984	July 6, 2017
How to feed entire logfile to elasticsearch as a message Logstash	4	1408	July 6, 2017
Parsing composite log format only with text Logstash	2	240	June 19, 2019

How to match lines in unstructured log starting with specific string

Related topics