Delete inner fields (from json) using lookup file


I'm processing quite large files using the json input codec. It's doing a great job of splitting up the data, creating fields and passing it to ES.

However... Some of the documents have a nested value a bit like this:

{ .... <stuff up here ... ,
    "outer" : { 
        "inner_data_one" : val1,
        "inner_data_two": val2,
        "inner_data_three": val3

This, quite correctly results in logstash creating fields like:

"outer" => {
    "inner_data_one" => val1,
    "inner_data_two" => val2,
    "inner_data_three" => val3

Which appear in ES as outer.inner_data_one and so on...

I need a way to delete SOME of those inner values... so far I see that I can do :

if [outer] {
		mutate {
			remove_field => ["[outer][inner_data_one]"]

This works and does remove the inner_data_one field which is great.. BUT... it is becoming quite laborious keeping the config up to date with potentially scores of inner data fields to be removed.

What I am looking for is a way to look up all the inner field names in a .txt file and if they are found there for them to be removed.

Is this possible?

Many thanks.

PS -- Re-reading this post I suppose what I might really be asking, which is simpler, is can remove_field work with an external file filled with values?

No, the mutate filter doesn't have such a feature. How often does this list of unwanted fields change? If it's not too often you could generate the necessary configuration. A few other options:

  • You might be able to abuse the translate filter to help out in some way (it can periodically reload a file from disk).
  • The ruby filter can definitely open and read files (but that'd be for each event, so it's costly).
    The prune filter could also be useful.
1 Like

Thanks Magnus,

I'll explore those options. The more I think about this I wonder about the viability of a new filter plugin that could be used to compare existing fields with a file containing known field names - where matching fields are then dropped or non-matching fields are dropped.

When parsing plain text logs I usually tag everything I want then drop anything that does not contain that tag to ensure sterile output, but when working with a json input ( codec=>"json" ), it's extremely effective - but you get absolutely everything.