Logstash : How to extract a nested field from Json log and only index the content of the nested field

We have some logs in JSON format with a nested field called "data". We are looking for an option to extract only the content of this nested field and send it for indexing with ES.

Actual log format:

{"field1":"value1", "field2":"value2", "field3":"value3", "field4":"value4", "data":{"nested_field1":"nested_value1","nested_field2":"nested_value2", "nested_field3":"nested_value3"}}

Logs need to be sent to ES:

{"nested_field1":"nested_value1","nested_field2":"nested_value2", "nested_field3":"nested_value3"}

I was trying to use the Logstash Config below. But this does not work:

input {
 file {
   type => "json"
   path => "/home/ranjith/logstash.log"
   start_position => beginning
   sincedb_path => "/dev/null"
 }
}
filter {
      json {
        source => "message"
      }

      mutate {
        add_field => {"data" => "%{[message][data]}"}
        remove_field => "message"
        }
}
output {
stdout { codec => json }
} 

Any help is appreictaed.

What is your output? You need to share the output you are getting.

Also, since you used the json to parse your message field, your fields will be in the root of the event, so you will have a data field, not a message.data field, you basically do not need that mutate filter as you already have the data field.

If you want to limit the fields you will send to elasticsearch you will need to use the prune filter.

Hello leandrojmp,

Thanks for your support on this. I was able to make some progress with the help of your suggestions. With the new logstash config, I was able to extract only data field, but still encapsulating all the other fields. I want to take all the fields outside the "data" nest

Input message:
{"field1":"value1", "field2":"value2", "field3":"value3", "field4":"value4", "data":{"nested_field1":"nested_value1","nested_field2":"nested_value2", "nested_field3":"nested_value3"}}

Current outpout with the Logstash config below:
{"data":{"nested_field3":"nested_value3","nested_field1":"nested_value1","nested_field2":"nested_value2"}}

Expected output:
{"nested_field3":"nested_value3","nested_field1":"nested_value1","nested_field2":"nested_value2"}
We would not be able to use static field names as the fields under data{} can be dynamic. I would need something like [data][*]

New logstash config:

input {
 file {
   type => "json"
   path => "/home/ranjith/logstash1.log"
   start_position => beginning
   sincedb_path => "/dev/null"
 }
}
filter {
      json {
        source => "message"
      }
      prune {
        whitelist_names => ["data"]
      }

      mutate {
        remove_field => [ "message" ]
      }
}
output {
stdout { codec => json }
}

I think that you will need to use the ruby filter to put the nested fields under data into the root level of the document.

I'm not an expert in ruby, but this other question has an example that may work in your case.

It would be somehint like this, but you will need to test it out.

ruby { 
    code => 'event.get("data").each { | k, v| event.set(k, v) }' 
}
mutate { 
    remove_field => [ "data" ] 
}

Those filters would need to be after the prune filter.

You can reduce that to

ruby { 
    code => 'event.remove("data").each { | k, v| event.set(k, v) }' 
}

That worked.. You are a saviour... Thank you for all your support.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.