Extract certain fields from JSON

Hi,

I am forwarding filebeat logs to an ES, and I would like to only extract certain fields in the JSON message and write them to ES.

For example, if I have the JSON message below:

{
   "field1": "info",
   "field2": "info2",
   "field3": {
      "field3a": {
         "field3b": {
            "field4": 123,
            "field5": "abc",
            "field6": {
               "field7": 456
            }
         }
      }
   }
}

On Kibana Discover, this whole JSON object is stored as message in a string. Is it possible to only extract and store field4, field5, and field7?

I've tried the decode_json_fields processor, but it seems to extract the entire JSON object and all its fields, and this "explosion" of data caused the data size to exceed some limit and the message was not sent to ES instead.

Thank you.

Curious how many fields are in this object?

But no there is no direct "selector" there is depth.

Perhaps You could decode the whole JSON and then drop the unneeded field with a drop_fields with some conditions.

Just a thought.... But this could also be memory/ CPU intensive.

There are probably more than 30 fields int this object.

I can try the drop_fields, do you know if I drop, say, field3b, will it drop all its children (field4 to field7), so I don't have to specify each field?

Ohhh I was thinking you meant 100s of fields.... so I am a bit confused by this statement (I guess you have some other mapping issues / many fields already)

Anyways yes when you drop the parent field it should drop all the children...

Give it a try, you can also do this with an ingest pipeline in Elasticsearch so the logic is centralized.

1 Like

I think that in this case the best solution would be to use a Ingest Pipeline is suggested by @stephenb.

You would need to use the json processor with a custom field in the target_field option.

Then you could use a couple of rename processors to rename the desired fields and after that you would remove the top-level field.

For example, if you use _tmp as the target field you would have something like this after the json processor:

{
   "_tmp": {
      "field1": "info",
      "field2": "info2",
      "field3": {
         "field3a": {
            "field3b": {
               "field4": 123,
               "field5": "abc",
               "field6": {
                  "field7": 456
               }
            }
         }
      }
   }
}

Since you want just field4, field5 and field7, you could use a rename field on them so you would rename _tmp.field3.field3a.field3b.field4 to field4 for example.

Then after the renames you would remove the entire _tmp field.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.