Mutate a field depending on field type


(Florin Andrei) #1

I'm parsing logs from a node.js app. One of the fields is a JSON document that the app inserts in the logs. I tried to parse the JSON part in a very basic way, which appeared to work fine for a while:

  json {
    source => "data"
    target => "node_post_data_json"
  }

But the problem is with one of the subfields therein, called node_post_data_json.auth. Sometimes that field contains only a string, and Logstash will happily send it to ElasticSearch as string. Other times 'auth' contains a bunch of key/value pairs, which Logstash then parses as object. The two situations arise randomly.

The problem is, ElasticSearch throws errors if you have a field like this flip-flopping back and forth between string and object. The reason for throwing the error is described here:

I cannot force the node.js app to stop changing the type of the 'auth' field, it's out of my control. It's returned as either string or object depending on the whim of that app.

Is there a way to mutate that field based on the field's type? What is the syntax to detect the field type (either string or object)? I would like to do this:

  • if the auth field is an object, leave it alone
  • if the auth field is a string, convert it into an object and store the string value in a sub-field called 'string' (or some other key that doesn't conflict with the pre-exising subfields there). Or, alternatively. delete the 'auth' field and re-create it as auth_string or something.

Any suggestions are much appreciated. Thanks.


(Magnus B├Ąck) #2

I think you'll have to use a ruby filter. Something like this:

ruby {
  code => "
    if not event['node_post_data_json']['auth'].is_a? String
      event['node_post_data_json']['auth'] = { 'string' => event['node_post_data_json']['auth'] }
    end
  "
}

(Florin Andrei) #3

Here's the code I've tried:

if ([message] =~ / POST /) {
  grok {
    match => { "message" => "%{DATA:timestamp_node} GMT\+0000 \(UTC\) %{WORD:event_type} %{NOTSPACE:request_id} %{URIPATH:endpoint}: %{GREEDYDATA:data}" }
  }
  json {
    source => "data"
    target => "node_post_data_json"
  }
    ruby {
      code => "
        if event['node_post_data_json']['auth'].is_a? String
          event['node_post_data_json']['auth_string'] = { 'string' => event['node_post_data_json']['auth'] }
          # somehow magically remove node_post_data_json.auth but I don't know how
        end
      "
    }
}

The idea is this:

If node_post_data_json.auth is String, then copy its value into node_post_data_json.auth_string, then delete the original field.

This way, when node_post_data_json.auth is not a string (and therefore it's an object), I leave it alone.

The problem is, the code does not detect when that field is a string. I see it appearing in ElasticSearch, and it's clearly a string, identified as such by Kibana, but the code never triggers.

Any idea how to fix this?


(system) #4