Delete inner fields (from json) using lookup file



I'm processing quite large files using the json input codec. It's doing a great job of splitting up the data, creating fields and passing it to ES.

However... Some of the documents have a nested value a bit like this:

{ .... <stuff up here ... ,
    "outer" : { 
        "inner_data_one" : val1,
        "inner_data_two": val2,
        "inner_data_three": val3

This, quite correctly results in logstash creating fields like:

"outer" => {
    "inner_data_one" => val1,
    "inner_data_two" => val2,
    "inner_data_three" => val3

Which appear in ES as outer.inner_data_one and so on...

I need a way to delete SOME of those inner values... so far I see that I can do :

if [outer] {
		mutate {
			remove_field => ["[outer][inner_data_one]"]

This works and does remove the inner_data_one field which is great.. BUT... it is becoming quite laborious keeping the config up to date with potentially scores of inner data fields to be removed.

What I am looking for is a way to look up all the inner field names in a .txt file and if they are found there for them to be removed.

Is this possible?

Many thanks.

PS -- Re-reading this post I suppose what I might really be asking, which is simpler, is can remove_field work with an external file filled with values?

(Magnus B├Ąck) #2

No, the mutate filter doesn't have such a feature. How often does this list of unwanted fields change? If it's not too often you could generate the necessary configuration. A few other options:

  • You might be able to abuse the translate filter to help out in some way (it can periodically reload a file from disk).
  • The ruby filter can definitely open and read files (but that'd be for each event, so it's costly).
    The prune filter could also be useful.


Thanks Magnus,

I'll explore those options. The more I think about this I wonder about the viability of a new filter plugin that could be used to compare existing fields with a file containing known field names - where matching fields are then dropped or non-matching fields are dropped.

When parsing plain text logs I usually tag everything I want then drop anything that does not contain that tag to ensure sterile output, but when working with a json input ( codec=>"json" ), it's extremely effective - but you get absolutely everything.


(system) #4