Inputting JSON object to logstash - unable to remove certain fields or parse as JSON


#1

For some context, I am completeley new to logstash and am interested in filtering this data in three ways:

  1. remove ceratin fields such as "anonymousId" or "library" (which is nested within context)
  2. extract fields from a nest (e.g. move "plan" outside of its nest of "properties"
  3. rename fields
  4. attempt to make the object as 'flat' (un-nested) as possible.

Below is the RAW json that is being forwarded to logstash, via an HTTP POST, and I will go through what I have tried.

{
    "anonymousId":null,
    "channel":"server",
    "context":{
        "ip":"208.54.83.183",
        "library":{
            "name":"analytics-ruby",
            "version":"2.0.12"},
        "userAgent":"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/43.0.2357.130 Chrome/43.0.2357.130 Safari/537.36"},
    "event":"Logged In",
    "integrations":{},
    "messageId":"a4c76305-1b02-46c9-8399-76a58fc8edb9",
    "originalTimestamp":"2015-07-27T11:19:27.797+02:00",
    "projectId":"Y0xNBc7l2I",
    "properties":{"plan":"Admin"},
    "receivedAt":"2015-07-27T09:19:29.374Z",
    "sentAt":"2015-07-27T09:19:27.809Z",
    "timestamp":"2015-07-27T09:19:29.362Z",
    "type":"track",
    "userId":"1",
    "version":2,
    "writeKey":"keqIuqD3O8iL1M5"
} 
  • Using grok's remove_field was partially successful but I was unable to remove "anoymousId" (my instinct tells me that this was due to it having a null value)
  • I was able to remove "channel", "integrations" - but unable to remove "context" nor "library" (I attempted to use object-dot-notation)
  • As odd as it may sound - all of the above was done without the "json" codec
  • Now I am at the point of trying to achieve my goals with the "json" codec and using the json filter plugin, but to no avail - I am not able to remove, add or alter any fields currently, below is my config file
input {
  http { port => 8090
            codec => "json"
  }
}
filter {
  grok { 
         match => [ "message", "%{GREEDYDATA}"]
  }
  json {
          source => "message"
          remove_field => "channel"
  }
}
output { 
  stdout { codec => rubydebug }
  }

If anyone could point out my oversite or redirect my efforts, it would be greatly appreciate it.
Edit: I apologize for the RAW JSON not being very readable, I'm working on getting the post to display the json with proper formatting.


(Magnus Bäck) #2
  • Use the json codec or the json filter, not both.
  • Your current grok filter serves no purpose.
  • To reference nested fields, use the [field][subfield] notation. See the documentation.

#3

Thank you for the quick response, I am unfortunately still getting parse error with this config:

input {
  http { port => 8090
            codec => "json"
  }
}
filter {
  grok { 
         remove_field => ["channel", "userId", "anonymousId"]
  }
}
output { 
  stdout { codec => rubydebug }
  }

perhaps I am not understanding logstash's default behaviors, here is the output:

{
          "anonymousId" => nil,
              "channel" => "server",
              "context" => {
               "ip" => "208.54.83.183",
          "library" => {
               "name" => "analytics-ruby",
            "version" => "2.0.12"
        },
        "userAgent" => "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/43.0.2357.130 Chrome/43.0.2357.130 Safari/537.36"
    },
                "event" => "Logged In",
         "integrations" => {},
            "messageId" => "a4c76305-1b02-46c9-8399-76a58fc8edb9",
    "originalTimestamp" => "2015-07-27T11:19:27.797+02:00",
            "projectId" => "Y0xNBc7l2I",
           "properties" => {
        "plan" => "Admin"
    },
           "receivedAt" => "2015-07-27T09:19:29.374Z",
               "sentAt" => "2015-07-27T09:19:27.809Z",
            "timestamp" => "2015-07-27T09:19:29.362Z",
                 "type" => "track",
               "userId" => "1",
              "version" => 2,
             "writeKey" => "keqI2D3O8iLGdbt",
             "@version" => "1",
           "@timestamp" => "2015-07-28T09:27:24.909Z",
              "headers" => {
                "content_type" => "application/json",
              "request_method" => "POST",
                "request_path" => "/hooks/created_callback",
                 "request_uri" => "/hooks/created_callback",
                "http_version" => "HTTP/1.1",
                    "http_csp" => "active",
          "http_cache_control" => "no-cache",
                 "http_origin" => "chrome-extension://fhbjgbiflinjbdggehcddcbncdddomop",
             "http_user_agent" => "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/43.0.2357.130 Chrome/43.0.2357.130 Safari/537.36",
          "http_postman_token" => "3d7015ed-4abd-d13e-f133-cf7ac4c73096",
                 "http_accept" => "*/*",
        "http_accept_encoding" => "gzip, deflate",
        "http_accept_language" => "en-US,en;q=0.8",
             "http_connection" => "close",
                   "http_host" => "localhost:8090",
              "content_length" => "674"
    },
                 "tags" => [
        [0] "_grokparsefailure"
    ]
}

Thanks again


(Magnus Bäck) #4

This looks perfectly fine. What parse error are you talking about?

Use the mutate filter to remove the fields, not the grok filter. When using remove_field with a non-mutate filter it's only done when the filter completes successfully, and if you're not providing any grok expression it won't count as successful (because it won't do anything).


#5

Thank you for this clarification! I am well on my way now


(system) #6