Logstash gsub remove string/characters

In my log file, I need to remove a defined characters from the RAW log.

my RAW log sample as below.

  Sep 18 150942 db[27414]:  category=\"Event\" subcategory=\"System\"....
  Sep 18 151043 db[27464]:  category=\"Event\" subcategory=\"System\" ...

and I want to remove "db[xxxxx]:" from the RAW log before it extract with KV

I tried several options, but could not get expected result.

Here is one sample regex which I have tried under gsub

      gsub => [ 
		"message", "db[.*]", ""
		]
	}

But above doesn't removed the defined string.

You need to escape the square brackets for that to work.

But I suggest that you use a grok filter to extract the different pieces of the log messages into discrete fields (timestamp in one field, program name in one field, pid in one field, and key/value pairs in one field).

resolved with removing the square bracket, now I have another issue when I filter RAW log with KV, and message error string doesn't filtered.

  Sep 18 18:15:43  category=\"Event\" subcategory=\"System\" typeid=30909 level=\"error\" user=\"admin\" nas=\"\" action=\"\" status=\"\" FTM license activation error: unable to resolve server domain name: directregistration.abc.com:443\u0000

message FTM license activation error: unable to resolve server domain name: directregistration.abc.com:443\u0000

doesn't filter with KV, how do I filter it with KV by adding any option ?

That string comes after the key/value pairs so the kv filter won't help you. You should be able to extract the string with a grok filter if you can describe when the key/value pairs end. For example, is it safe to assume that the message at the end never contains a double quote? Put differently, how would grok know that "FTM license" is where the message begins?

last kv pair is "status=" value of status= may anything and after that error messages comes. But we can't say that error message begins with a exact word, it may change. The only thing we have to consider is the last key/value pair.

Well, it's good enough if we know that the last key is status. Then a grok expression similar to

(match date and program/pid here) (?<kvpairs>.* status="[^"]+" )%{GREEDYDATA:extra_message}

should work.

I tried below code

    grok {
		match => [ "message", "%{SYSLOGTIMESTAMP:timestamp} %{GREEDYDATA:kvpairs}"]
	}

	kv { 
		source => "kvpairs"
	}

Above code correctly filter date and KV pairs.

sample output for above filter is listed below and full raw message has not dropped in the filtered result.

 {
   "nas_name" => "\"\"",
  "log_level" => "error",
     "source" => "192.168.1.1",
    "message" => "Sep 19 09:31:42  category=\"Event\" subcategory=\"System\" typeid=30909 level=\"error\" user=\"admin\" nas=\"\" action=\"\" status=\"\" FTM license activation error: unable to resolve server domain name: directregistration.abc.com:443\u0000",
       "type" => "forti-authnticator",
   "@version" => "1",
     "action" => "\"\"",
     "typeid" => "30909",
   "category" => "Event",
"subcategory" => "System",
       "user" => "admin",
  "timestamp" => "Sep 19 09:31:42",
     "status" => "\"\""
}

I have add additional filter grok to separate message on end of the log. But conf file gives errors and failed to execute.

As you said, I have tried to separate the extra message at the end of the RAW log. But tried scenarios gives lots of errors. So what is the best way to match, is it okey again apply grok pattern to separate ?.

I have add additional filter grok to separate message on end of the log. But conf file gives errors and failed to execute.

I can't help without knowing what your configuration looked like.

Here is the sample log

<11>Sep 18 12:16:42 db[19023]:  category=\"Event\" subcategory=\"System\" typeid=30909 level=\"error\" user=\"admin\" nas=\"\" action=\"\" status=\"\" FTM license activation error: unable to resolve server domain  name: directregistration.abc.com:443\u0000

here is the configuration file

input {
    udp {
	  port => 30001 
      type => "raw_log"
    }
}

filter {
   if [type] == "raw_log" 
   { 
    mutate {
	     gsub => [ 
		         "message", "db\[.*\]: ", "",  // => remove db[xxxxx]:
		         "message", "<.*>", ""   // => remove <xx>
		 ]
	 }

	grok {
		match => [ "message", "%{SYSLOGTIMESTAMP:timestamp} %{GREEDYDATA:kvpairs}"]
	}

	kv { 
		source => "kvpairs"
	}

	mutate {
		rename => [ "oid", "type_id" ]z
		rename => [ "logid", "log_id" ]
	    rename => [ "level", "log_level" ]
		rename => [ "cat", "category" ]
		rename => [ "subcategorty", "sub_categorty" ]
		rename => [ "nas", "nas_name" ]
		rename => [ "host", "source" ]
		remove_field => [ "@timestamp", kvpairs ]
         }
   }
}


output {
     stdout { codec => rubydebug }
}

Could you help me to sort out this !

  rename => [ "oid", "type_id" ]z

Do you really have a "z" there?

No, it is mistake while I put it here. there is no z

 rename => [ "oid", "type_id" ]

I have resolved it using grok with help of your hint

I need to resolve one thing again

I need to convert

SYSLOGTIMESTAMP into TIMESTAMP_ISO8601

For an example

Sep 19 16:41:43 in to 2017-09-19 16:41:43

I tried below coding

   date {
		match => ["timestamp", " yyyy-MM-dd HH:mm:ss.SSS"]
	}

it doesn't convert ti required format

The pattern should be the format you have, not the format you want. See https://www.elastic.co/guide/en/logstash/current/config-examples.html#_processing_syslog_messages for an example.

Yes, that's why I need a method to convert into the format which I need

The date filter doesn't let you configure the output format. You'll have to write some Ruby code in a ruby filter.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.