Cannot manage nested fields

Hi,
could you please help to understand why removing of nested fields from 'grok' and 'mutate' is not working in my case, or how to remove them properly via filter settings, or any well know best practices regarding this are welcome as well!
What is wrong in current configuration?

filter

if [type] == "webapp_access-events" {
grok {
patterns_dir => [ "/etc/logstash/patterns.d" ]
match => { "message" => "%{GHTTP}" }
overwrite => [ "message" ]
#remove_field => [ "[webapp_access-events][os]", "[webapp_access-events][os_name]", "[webapp_access-events][beat.hostname]", "[webapp_access-events][beat.version]", "[webapp_access-events][input]" ]
}
mutate {
remove_field => [ "[webapp_access-events][os]", "[webapp_access-events][os_name]", "[webapp_access-events][beat.hostname]", "[webapp_access-events][beat.version]", "[webapp_access-events][input]" ]
}

curl -s -XGET http://`hostname`:9200/webapp-events-2017.06.05/_mapping/field/os_name?pretty=true
{
"webapp-events-2017.06.05" : {
"mappings" : {
"webapp_access-events" : {
"os_name" : {
"full_name" : "os_name",
"mapping" : {
"os_name" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
}
}
}
}
}

logstash version 5.3.2

/usr/share/logstash/bin/logstash-plugin list | grep filter
...
logstash-filter-grok
logstash-filter-mutate
...

Please show an example event from ES (or a stdout { codec => rubydebug } output).

Due to "Sorry, you can only mention 10 users in a post."..not sure what does it meant,but cutting the original message:

{
"_index": "webapp-events-2017.06.07",
"_type": "webapp_access-events",
"_id": "AVyEHiFJrNmMuqrEW1OI",
"_source": {
...
"beat": {
"hostname": "hostname1",
"name": "hostname1",
"version": "5.4.0"
},
"host": "hostname1",
"client_ip": "10.10.10.1",
"geoip": {},
"offset": 45834794,
"method": "GET",
"req_file": "..reqfile..",
"os": "Other",
...
"message": "..message..",
"tags": [
"beats_input_codec_plain_applied",
"_geoip_lookup_failure"
],
"referrer": ""-"",
"input": "735",
"@timestamp": "2017-06-07T19:53:00.000Z",
"response": 200,
"bytes": 498,
"name": "Other",
"os_name": "Other",
"device": "Other"
}
...

Right, as I suspected. You say

remove_field => [ "[webapp_access-events][os]", ...

as if os was a subfield of webapp_access-events, but it's actually a field at the root. So, turn

remove_field => [ "[webapp_access-events][os]", "[webapp_access-events][os_name]", "[webapp_access-events][beat.hostname]", "[webapp_access-events][beat.version]", "[webapp_access-events][input]" ]

into this:

remove_field => [ "os", "os_name", "[beat][hostname]", "[beat][version]", "input" ]

(Note [beat][hostname], not [beat.hostname].)

Appreciate your help @magnusbaeck!
Some of fields where successfully removed, some of them are not:

in conf.file:

...
grok {
...
remove_field => [ "device", "name", "os", "os_name", "output", "[beat][hostname]", "[beat][version]", "input", "input_type", "ident", "version", "source", "host", "offset", "http_version", "referrer" ]

but incoming fields looks like:

...
(metaFields)
...
{
"_source": {
"request": "req",
"agent": "agent",
"auth": "-",
"req_param": "req_param",
"type": "webapp_access-events",
"serve_in_sec": "0",
"@version": "1",
"beat": {
"name": "hostname1"
},
"client_ip": "10.10.10.1",
"geoip": {},
"method": "GET",
"req_file": "reqfile",
"os": "Other",
"serve_in_msec": "296",
"message": "mess",
"tags": [
"beats_input_codec_plain_applied",
"_geoip_lookup_failure"
],
"@timestamp": "2017-06-08T11:36:58.000Z",
"response": 200,
"bytes": 498,
"name": "Other",
"os_name": "Other",
"device": "Other"
},
"@timestamp": [
1496921818000
]
},
"sort": [
1496921818000
]
}
as you see for some reason part of them: "name", "os", "os_name", "device" are still there.
Second question is 'remove_filed' should be invoke from "grok" or by "mutate" ?

as you see for some reason part of them: "name", "os", "os_name", "device" are still there.

Please show your full configuration.

Second question is 'remove_filed' should be invoke from "grok" or by "mutate" ?

It depends. Normally, remove_field is only effective if the filter it's in was successful, i.e. if you put it in a grok filter it'll only remove fields if the grok filter matches. If it's the sole option in a mutate filter it'll run unconditionally.

cat conf.d/60-http.conf

filter {
if [type] == "webapp_access-events" {
grok {
patterns_dir => [ "/etc/logstash/patterns.d" ]
match => { "message" => "%{GHTTP}" }
remove_field => [ "device", "name", "os", "os_name", "output", "[beat][hostname]", "[beat][version]", "input", "input_type", "ident", "version", "source", "host", "offset", "http_version", "referrer" ]
overwrite => [ "message" ]
}

  if "_grokparsefailure" in [tags] {
      drop {}
      }

  mutate {
     convert => {"response" => "integer"}
     convert => {"bytes" => "integer"}
  }
  geoip {
     source => "client_ip"
     target => "geoip"
     #add_tag => [ "apache-geoip" ]
  }
  date {
     match => [ "timestamp" , "dd/MMM/yyyy:HH:mm:ss Z" ]

     remove_field => [ "timestamp" ]
  }
  useragent {
     source => "agent"
  }

}
}

The useragent filter produces the fields after you've tried to remove them in your grok filter. Add a mutate filter that deletes them when they actually exist.

Awesome @magnusbaeck!
It works!
You are the best :wink:

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.