Trying to access nested json in logstash mutate filter


(Skanjila) #1

Hello Folks,
I've tried hacking at this for a while now so I thought I'd ask for some help, I have the following json in the message string associated with my input:

{"1.0":{"metric-id":"45","customer-id":"Cust","alarm-id":"123","alarm-name":"SomeName","alarm-threshold":"1","alarm-state":"state","talker-label":"1.1.2.3","talker-rate-per-sec":"3.221","dedupe-string":"Alarm-123","last-state-change-time":"2011-11-20T02:52:00.000Z"}}

I'd like to create a new field called metric_id and assign it the value of the metric id above,

Here's my logstash config:

bin/logstash -e 'input {
sqs {
queue=>"testqueue"
}
}
filter {
mutate {
add_field => {
"metric-id" => "%{[Message][1.0][metric-id]}"
}
}
}
output {
amazon_es {
hosts => ["testhost"]
region => "testregion"
index => "testindex"
}
stdout {codec=>rubydebug } }'

The above doesn't seem to extract the nested json correctly, I have tried several variations of the add_field like:

"metric-id" => "%{[1.0][metric-id]}"
"metric-id" => "%{1.0.metric-id}"

What am I missing, from reading the docs it seems like I just need square parenthesis around everything?

Would love some help on this.
Thanks


(Magnus Bäck) #2

"metric-id" => "%{[1.0][metric-id]}" works fine for me:

$ cat test.config 
input { stdin { codec => "json" } }
output { stdout { codec => "rubydebug" } }
filter {
  mutate {
    add_field => {
      "metric-id" => "%{[1.0][metric-id]}"
    }
  }
}
$ echo '{"1.0":{"metric-id":"45","customer-id":"Cust","alarm-id":"123","alarm-name":"SomeName","alarm-threshold":"1","alarm-state":"state","talker-label":"1.1.2.3","talker-rate-per-sec":"3.221","dedupe-string":"Alarm-123","last-state-change-time":"2011-11-20T02:52:00.000Z"}}' | /opt/logstash/bin/logstash -f test.config
Logstash startup completed
{
           "1.0" => {
                     "metric-id" => "45",
                   "customer-id" => "Cust",
                      "alarm-id" => "123",
                    "alarm-name" => "SomeName",
               "alarm-threshold" => "1",
                   "alarm-state" => "state",
                  "talker-label" => "1.1.2.3",
           "talker-rate-per-sec" => "3.221",
                 "dedupe-string" => "Alarm-123",
        "last-state-change-time" => "2011-11-20T02:52:00.000Z"
    },
      "@version" => "1",
    "@timestamp" => "2015-11-20T06:39:30.182Z",
          "host" => "lnxolofon",
     "metric-id" => "45"
}
Logstash shutdown completed

(Skanjila) #3

Ok just a couple of follow up questions, first of my logstash config looks slightly different:
bin/logstash -e 'input {
sqs {
queue=>"TestQueue"
}
}
filter {
mutate {
add_field => {
"metric-id" => "%{[1.0][metric-id]}"
}
}
}
output {
amazon_es {
hosts => ["someHost"]
region => "someRegion"
index => "myindex"
}
stdout {codec=>rubydebug } }'

Questions:

  1. The major differences between your logstash config and mine is the fact that I am reading messages from an sqsqueueand the ordering of my filter versus yours ,do these differences matter?
  2. What if I want to add the additional field called metric-id to my output elasticsearch index, is this the right way to go about that, its not clear from the documentation whether add_field will add a field to the actual output index
  3. I still cannnot get metric-id to be printed correctly, in fact I get a value of "metric-id" => "%{[1.0][metric-id]}" in my ruby debug output

Thoughts?


(Magnus Bäck) #4
  1. The source of the message doesn't matter as such. The ordering of filters matters but only in relation to each other. If e.g. filters or outputs are listed first doesn't matter.
  2. Yes, you're on the right track. Except for subfields of @metadata all fields of an event propagate to outputs.
  3. What else do you get on stdout?

(Skanjila) #5

Here's the output:

{
"Type" => "Notification",
"MessageId" => "12345",
"TopicArn" => "somearn",
"Message" => "{"1.0":{"metric-id":"45","customer-id":"csid","alarm-id":"Test","alarm-name":"Test","alarm-threshold":"500","alarm-state":"alarm","talker-label":"1.1.1.1","talker-rate-per-sec":"1234.3333","dedupe-string":"Dedupe","last-state-change-time":"2011-11-20T20:41:00.000Z"}}",
"Timestamp" => "2011-11-20T20:43:56.996Z",
"SignatureVersion" => "1",
"Signature" => "someSignature",
"SigningCertURL" => "SomeUrl",
"UnsubscribeURL" => "https://unsubscribeurl",
"@version" => "1",
"@timestamp" => "2011-11-20T21:26:37.962Z",
"metric-id" => "%{[1.0][metric-id]"
}

As you can see the metric-id still doesn't get set correctly,

Attaching the logstash config again:

bin/logstash -e 'input {
sqs {
queue=>"TestQueue"
}
}
filter {
mutate {
add_field => {
"metric-id" => "%{[1.0][metric-id]"
}
}
}
output {
amazon_es {
hosts => ["somehost"]
region => "someregion"
index => "someindex"
}
stdout {codec=>rubydebug }

Any ideas ?


(Skanjila) #6

Sigh :(, figured it out, here's the winning lottery ticket to help others for the logstash config:
bin/logstash -e 'input {
sqs {
queue=>"MyTable"
}
}
filter {
json {
source => "Message"
target => "doc"
}
mutate {
add_field => {
"metric-id" => "%{[doc][1.0][metric-id]}"
"customer-id" => "%{[doc][1.0][customer-id]}"
"alarm-id" => "%{[doc][1.0][alarm-id]}"
"alarm-name" => "%{[doc][1.0][alarm-name]}"
"alarm-threshold" => "%{[doc][1.0][alarm-threshold]}"
"alarm-state" => "%{[doc][1.0][alarm-state]}"
"talker-label" => "%{[doc][1.0][talker-label]}"
"talker-rate-per-sec" => "%{[doc][1.0][talker-rate-per-sec]}"
"dedupe-string" => "%{[doc][1.0][dedupe-string]}"
"last-state-change-time" => "%{[doc][1.0][last-state-change-time]}"
}
}
}
output {
amazon_es {
hosts => ["somehost"]
region => "someregion"
index => "someindex"
}
stdout {codec=>rubydebug } }'


(Magnus Bäck) #7

Yep, you got it. This is surprising since the default codec of the sqs input is json so at least I had a reasonable expectation that the messages read would be subject to JSON decoding. Perhaps this isn't working because the plugin stores the message payload in the Message field rather than message?


(Vincent Tran) #8

I'm trying to do something similar to this:

  1. pipe the content of "message" to a new field "raw"
  2. Add new fields by accessing the nested content inside of the original raw data
  3. The fat lady sings?

Not quite. rubydebug output looks fine, but when I switch over to ES output, the fields do not resolve as expected.

Did you encounter this problem?


(Skanjila) #9

Exactly right, anyways on to the next adventure/hacking session with logstash filters :smile:

I have another contextual question is there a filter in logstash that acts as a datafeed to pull in data from various sources apply some business logic and dump the resulting aggregated data into elasticsearch, I have a feeling that I need to write something custom for this but wanted to check to see whats already existing?


(Magnus Bäck) #10

I have another contextual question is there a filter in logstash that acts as a datafeed to pull in data from various sources apply some business logic and dump the resulting aggregated data into elasticsearch, I have a feeling that I need to write something custom for this but wanted to check to see whats already existing?

It's usually inputs that pull data from external sources, and there are obviously a lot of input plugins available. Your question is too broad for an answer, but in general Logstash acts on a per message basis so correlating data between different data sources or aggregating data typically requires custom code. It's not Logstash's strongest point.


(system) #11