How to overwrite message (set to minus) if it is not overwrited in grok?

Hello. In message there is json, but in some messages there is no json. I want set message to dash in first case (where there is no json). How to do it? Please, help me.

cat /etc/logstash/mytests/test1

input {
generator { count => 1 message => '2018-06-13 20:18:08 > 95.153.131.227 > RESOURCE#24973 > OPENED' }
generator { count => 1 message => '2018-06-20 12:58:52 > 95.153.222.121 > RESOURCE#51059 > DRIVER #1976 > ACTION QUERY: {"action":"orderSum","parameters":{"trip_id":12507}}' }
}
output { stdout { codec => rubydebug { metadata => true } } }
filter {
grok {
match => { "message" => "%{DATESTAMP:timestamp}%{SPACE}>%{SPACE}%{IP:remote_ip}%{SPACE}>%{SPACE}RESOURCE#%{NONNEGINT:resource}%{SPACE}>%{SPACE}((?\w+$)|DRIVER%{SPACE}#%{NONNEGINT:driver_id}%{SPACE}>%{SPACE}(?([\s\w]+[^:{]+$|\w+(\s+[^\s:]+))?):?(%{SPACE}(?{.*}))?)" }
overwrite => [ "message" ]
}
date {
match => ["timestamp", "yy-MM-dd HH:mm:ss"]
target => "@timestamp"
timezone => "Europe/Moscow"
}
mutate {
remove_field => [ "timestamp" ]
}
}

xen ~ # /opt/logstash/bin/logstash -f /etc/logstash/mytests/test1

{
"sequence" => 0,
"driver_id" => "1976",
"remote_ip" => "95.153.222.121",
"@timestamp" => 2018-06-20T09:58:52.000Z,
"resource" => "51059",
"@version" => "1",
"host" => "xen",
"message" => "{"action":"orderSum","parameters":{"trip_id":12507}}",
"category" => "ACTION QUERY"
}
{
"sequence" => 0,
"remote_ip" => "95.153.131.227",
"@timestamp" => 2018-06-13T17:18:08.000Z,
"resource" => "24973",
"@version" => "1",
"host" => "xen",
"message" => "2018-06-13 20:18:08 > 95.153.131.227 > RESOURCE#24973 > OPENED",
"category" => "OPENED"
}

I want in second case in output (first generator in input) set message to "-", but not original message, because original message have no json. Grok matches in this two case. In first case non-obligatory field "message" does not match.

Instead of using a non-obligatory field in the grok pattern, you could let the grok fail, resulting in a _grokparsefailure tag being added to the event. Then test for that tag and mutate the message field.

Thank you for reply. But if message no match, no field are appears.

input {
generator { count => 1 message => '2018-06-13 20:18:08 > 95.153.131.227 > RESOURCE#24973 > OPENED' }
}
output { stdout { codec => rubydebug { metadata => true } } }
filter {
grok {
match => { "message" => "%{DATESTAMP:timestamp}%{SPACE}>%{SPACE}%{IP:remote_ip}%{SPACE}>%{SPACE}RESOURCE#%{NONNEGINT:resource}%{SPACE}>%{SPACE}(?[\s\w]+[^:{]+$|\w+(\s+[^\s:]+)):%{SPACE}(?{.*})" }
overwrite => [ "message" ]
}

In result: no "remote_ip", no "resource", no "category" extracted from message. Where i get this fields?

{
      "sequence" => 0,
    "@timestamp" => 2018-07-04T14:27:02.934Z,
      "@version" => "1",
          "host" => "xen",
       "message" => "2018-06-13 20:18:08 > 95.153.131.227 > RESOURCE#24973 > OPENED",
          "tags" => [
        [0] "_grokparsefailure"
    ]
}

add: I think that need to save "message" with "origmessage" before grok and after grok compare "origmessage" and "message": if it equals - then mutate "message"
But i cannot realize that.

Thanks for supplying the complete configuration, but I get a syntax error with that pattern, which makes me think something in it is being mangled. Can you post just that line indented with four spaces so that it shows up like this

match => { "message" => "%{DATESTAMP:timestamp}%{SPACE}>%{SPACE}%{IP:remote_ip}%{SPACE}>%{SPACE}RESOURCE#%{NONNEGINT:resource}%{SPACE}>%{SPACE}((?\w+$)|DRIVER%{SPACE}#%{NONNEGINT:driver_id}%{SPACE}>%{SPACE}(?([\s\w]+[^:{]+$|\w+(\s+[^\s:]+))?):?(%{SPACE}(?{.*}))?)" }

Badger, thank you for reply. Original regex below. After paste to textarea, i press "Preformated text" and at right preview appears green vertiacal line in left of the pasted text, as in your example. This is right way for format? "match" and "overwrite" has 8 spaces from left, "grok" has 6 spaces. Sorry, i am newbee with this forum and formatting (and english).

grok {
    match => { "message" => "%{DATESTAMP:timestamp}%{SPACE}>%{SPACE}%{IP:remote_ip}%{SPACE}>%{SPACE}RESOURCE#%{NONNEGINT:resource}%{SPACE}>%{SPACE}((?<category>\w+$)|DRIVER%{SPACE}#%{NONNEGINT:driver_id}%{SPACE}>%{SPACE}(?<category>([\s\w]+[^:\{]+$|\w+(\s+[^\s:]+))?):?(%{SPACE}(?<message>\{.*\}))?)" }
    overwrite => [ "message" ]
}

Input lines may be with json at the end and without it. If no json, i want set "message" = "-". Else "message" = json.

Example lines:
2018-05-12 16:21:52 > 95.153.128.248 > RESOURCE#47544 > DRIVER #1913 > DISCONNECTED
2018-05-12 16:21:52 > 95.153.128.248 > RESOURCE#47544 > CLOSED
2018-05-12 16:21:53 > 95.153.135.37 > RESOURCE#47660 > DRIVER #1883 > DISCONNECTED
2018-05-12 16:21:53 > 95.153.135.37 > RESOURCE#47660 > CLOSED
2018-05-12 16:21:53 > 95.153.222.59 > RESOURCE#47698 > DRIVER #1837 > ACTION QUERY: {"action":"orderSum","parameters":{"trip_id":4085}}

In all cases message has RESOURCE. In some cases message also has DRIVER. After DRIVER there are CATEGORY. If CATEGORY is "ACTION QUERY", then : and json. But after CATEGORY may be nothing (example: CLOSED, DISCONNECTED, ...)

add: after RESOURCE may be CATEGORY or may be DRIVER. After DRIVER may be CATEGORY only, or may be CATEGORY: JSON

add2: I see in your quote of regex from my bad quote of regex there are no backslash before { and } in the end of regex ("message" or JSON)

add3: I tested all input string on https://grokconstructor.appspot.com/do/match#result until these all matched. After i replaced GREEDYDATA to json pattern. But in this site there is no overwrite, mutate and other logstash directives, and i do tests with logstash 6.2.4 locally.

Does this solve your problem?

json { source => "message" }
if "_jsonparsefailure" in [tags] {
    mutate { 
        remove_tag => [ "_jsonparsefailure" ]
        replace => { "message" => "-" }
    }
}

Badger, thank you for reply. Partially this code solve my problem, but appears another in logstash warn messages and extra fields in output.

[INFO ] 2018-07-04 20:23:47.047 [LogStash::Runner] runner - Starting Logstash {"logstash.version"=>"6.2.4"}
[INFO ] 2018-07-04 20:23:47.638 [Api Webserver] agent - Successfully started Logstash API endpoint {:port=>9601}
[INFO ] 2018-07-04 20:23:57.350 [Ruby-0-Thread-1: /opt/logstash/lib/bootstrap/environment.rb:6] pipeline - Starting pipeline {:pipeline_id=>"main", "pipeline.workers"=>24, "pipeline.batch.size"=>125, "pipeline.batch.delay"=>50}
[INFO ] 2018-07-04 20:23:58.013 [Ruby-0-Thread-1: /opt/logstash/lib/bootstrap/environment.rb:6] pipeline - Pipeline started successfully {:pipeline_id=>"main", :thread=>"#<Thread:0x51489734 run>"}
[INFO ] 2018-07-04 20:23:58.233 [Ruby-0-Thread-1: /opt/logstash/lib/bootstrap/environment.rb:6] agent - Pipelines running {:count=>1, :pipelines=>["main"]}
[WARN ] 2018-07-04 20:23:58.995 [Ruby-0-Thread-29@[main]>worker20: :1] json - Error parsing json {:source=>"message", :raw=>"2018-06-13 20:18:08 > 95.153.131.227 > RESOURCE#24973 > OPENED", :exception=>#<LogStash::Json::ParserError: Unexpected character ('-' (code 45)): Expected space separating root-level values
at [Source: (byte[])"2018-06-13 20:18:08 > 95.153.131.227 > RESOURCE#24973 > OPENED"; line: 1, column: 6]>}

{
      "sequence" => 0,
    "@timestamp" => 2018-06-13T17:18:08.000Z,
     "remote_ip" => "95.153.131.227",
      "resource" => "24973",
      "@version" => "1",
          "host" => "xen",
       "message" => "-",
      "category" => "OPENED",
          "tags" => []
}
{
      "sequence" => 0,
     "driver_id" => "1976",
    "@timestamp" => 2018-06-20T09:58:52.000Z,
     "remote_ip" => "95.153.222.121",
      "resource" => "51059",
      "@version" => "1",
          "host" => "xen",
        "action" => "orderSum",
       "message" => "{\"action\":\"orderSum\",\"parameters\":{\"trip_id\":12507}}",
      "category" => "ACTION QUERY",
    "parameters" => {
        "trip_id" => 12507
    }
}

[INFO ] 2018-07-04 20:24:00.251 [[main]-pipeline-manager] pipeline - Pipeline has terminated {:pipeline_id=>"main", :thread=>"#<Thread:0x51489734 run>"}

Before quote logstash warn message. It seems that json attempt to parse "-" (dash) from timestamp.
In second output part appears "action" and "parameters" from json. In this case i do not need to parse json.

"action" => "orderSum",
...
"parameters" => {
    "trip_id" => 12507
}

At first output part "message" is "-", as i want, thanks. But extra empty field "tags" appears.

Pattern with your extra code:

input {
  generator { count => 1 message => '2018-06-13 20:18:08 > 95.153.131.227 > RESOURCE#24973 > OPENED' }
  generator { count => 1 message => '2018-06-20 12:58:52 > 95.153.222.121 > RESOURCE#51059 > DRIVER #1976 > ACTION QUERY: {"action":"orderSum","parameters":{"trip_id":12507}}' }
}
output { stdout { codec => rubydebug { metadata => true } } }
filter {
  grok {
    match => { "message" => "%{DATESTAMP:timestamp}%{SPACE}>%{SPACE}%{IP:remote_ip}%{SPACE}>%{SPACE}RESOURCE#%{NONNEGINT:resource}%{SPACE}>%{SPACE}((?<category>\w+$)|DRIVER%{SPACE}#%{NONNEGINT:driver_id}%{SPACE}>%{SPACE}(?<category>([\s\w]+[^:\{]+$|\w+(\s+[^\s:]+))?):?(%{SPACE}(?<message>\{.*\}))?)" }
    overwrite => [ "message" ]
  }
  date {
    match => ["timestamp", "yy-MM-dd HH:mm:ss"]
    target => "@timestamp"
    timezone => "Europe/Moscow"
  }
  mutate {
    remove_field => [ "timestamp" ]
  }
  json {
    source => "message"
  }
  if "_jsonparsefailure" in [tags] {
    mutate {
      remove_tag => [ "_jsonparsefailure" ]
      replace => { "message" => "-" }
    }
  }
}

You can remove the empty array using

ruby {
    code => '
        if event.get("tags") == []
            event.remove("tags")
        end
    ' 
}

And in the json filter, if you do not actually want the json parsed, parse into a sub-field of [@metadata]

json { source => "message" target => [@metadata][junk]" }

Badger, thank you for reply. Result is:
[WARN ] 2018-07-06 17:09:26.202 [Ruby-0-Thread-27@[main]>worker18: :1] json - Error parsing json {:source=>"message", :raw=>"2018-06-13 20:18:08 > 95.153.131.227 > RESOURCE#24973 > OPENED", :exception=>#<LogStash::Json::ParserError: Unexpected character ('-' (code 45)): Expected space separating root-level values
at [Source: (byte[])"2018-06-13 20:18:08 > 95.153.131.227 > RESOURCE#24973 > OPENED"; line: 1, column: 6]>}

{
      "sequence" => 0,
     "remote_ip" => "95.153.131.227",
    "@timestamp" => 2018-06-13T17:18:08.000Z,
      "resource" => "24973",
      "@version" => "1",
          "host" => "xen",
       "message" => "-",
      "category" => "OPENED"
}
{
      "sequence" => 0,
     "driver_id" => "1976",
     "remote_ip" => "95.153.222.121",
    "@timestamp" => 2018-06-20T09:58:52.000Z,
      "resource" => "51059",
     "@metadata" => {
        "junk" => {
                "action" => "orderSum",
            "parameters" => {
                "trip_id" => 12507
            }
        }
    },
      "@version" => "1",
          "host" => "xen",
       "message" => "{\"action\":\"orderSum\",\"parameters\":{\"trip_id\":12507}}",
      "category" => "ACTION QUERY"
}

I am worry about warning. It will be in logstash-plain.log for every string, where no JSON at end.
What about my variant: save message before grok and compare it with message after grok. If they are equals - then no json in message and may mutate message, set dash.

Yeah, it works, but it does result in a rather noisy log file. The answer would be to test if the message is there. You could do that by checking for a grokfailure, as you mentioned, or possibly checking for some other pattern. For example, do only messages with json contain the string '{"'? Is it only ACTION QUERY that contains json?

Badger, if JSON present, it position at the end of string after (ACTION QUERY:%{SPACE} or ACTION ANSWER:%{SPACE}). Neither else symbols "{" and "}" are present in source strings, only in JSON at the end of string (if JSON exist).

Why not to try my variant: save orig message and compare it with message after grok? I try to do this way, but no success - always get syntax error.

OK, so you could do something like

if [message =~ /ACTION (QUERY|ANSWER): / {
    expect JSON
}

Badger, seems its ok. Without json, junk and ruby code.

input {
  generator { count => 1 message => '2018-06-13 20:18:08 > 95.153.131.227 > RESOURCE#24973 > OPENED' }
  generator { count => 1 message => '2018-06-20 12:58:52 > 95.153.222.121 > RESOURCE#51059 > DRIVER #1976 > ACTION QUERY: {"action":"orderSum","parameters":{"trip_id":12507}}' }
}
output { stdout { codec => rubydebug { metadata => true } } }
filter {
  if [message] =~ /ACTION (QUERY|ANSWER): / {
    grok {
      match => { "message" => "%{DATESTAMP:timestamp}%{SPACE}>%{SPACE}%{IP:remote_ip}%{SPACE}>%{SPACE}RESOURCE#%{NONNEGINT:resource}%{SPACE}>%{SPACE}((?<category>\w+$)|DRIVER%{SPACE}#%{NONNEGINT:driver_id}%{SPACE}>%{SPACE}(?<category>([\s\w]+[^:\{]+$|\w+(\s+[^\s:]+))?):?(%{SPACE}(?<message>\{.*\}))?)" }
      overwrite => [ "message" ]
    }
  }
  else {
    grok {
      match => { "message" => "%{DATESTAMP:timestamp}%{SPACE}>%{SPACE}%{IP:remote_ip}%{SPACE}>%{SPACE}RESOURCE#%{NONNEGINT:resource}%{SPACE}>%{SPACE}((?<category>\w+$)|DRIVER%{SPACE}#%{NONNEGINT:driver_id}%{SPACE}>%{SPACE}(?<category>([\s\w]+[^:\{]+$|\w+(\s+[^\s:]+))?):?(%{SPACE}(?<message>\{.*\}))?)" }
    }
    mutate {
      replace => { "message" => "-" }
    }
  }
  date {
    match => ["timestamp", "yy-MM-dd HH:mm:ss"]
    target => "@timestamp"
    timezone => "Europe/Moscow"
  }
  mutate {
    remove_field => [ "timestamp" ]
  }
}

[INFO ] 2018-07-13 18:04:53.131 [Ruby-0-Thread-1: /opt/logstash/lib/bootstrap/environment.rb:6] agent - Pipelines running {:count=>1, :pipelines=>["main"]}

{
      "sequence" => 0,
     "remote_ip" => "95.153.131.227",
    "@timestamp" => 2018-06-13T17:18:08.000Z,
      "resource" => "24973",
      "@version" => "1",
          "host" => "xen",
      "category" => "OPENED",
       "message" => "-"
}
{
      "sequence" => 0,
     "driver_id" => "1976",
     "remote_ip" => "95.153.222.121",
    "@timestamp" => 2018-06-20T09:58:52.000Z,
      "resource" => "51059",
      "@version" => "1",
          "host" => "xen",
      "category" => "ACTION QUERY",
       "message" => "{\"action\":\"orderSum\",\"parameters\":{\"trip_id\":12507}}"
}

[INFO ] 2018-07-13 18:04:55.022 [[main]-pipeline-manager] pipeline - Pipeline has terminated {:pipeline_id=>"main", :thread=>"#<Thread:0x74885391 run>"}

Thank you for help.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.