Mixed kinds of rename option in single mutate filter won't work

In a single mutate filter, we can't use the mixed kinds of rename option. This is not configuration issue but a runtime issue so it is hard to debug. As this kind of config setup doesn't compile to an expected config so it should pro-actively notify before hand on start-up.

Kind#1. rename option for single field
Kind#2. rename option for multiple fields

mutate {
    rename => [ "[details][data][field1]", "[new_field1]"]
    rename => {
      "[details][data][field2]" => "[new_field2]"
      "[details][data][field3]" => "[new_field3]"
      "[details][data][field4]" => "[new_field4]"
      "[details][data][field5]" => "[new_field5]"
    }
  }

This came into noticed, when we tried to group some fields logically at one place for a big config file for better readability but it had issues during runtime.

mutate+rename expects a hash. If you specify an array in the configuration then logstash will automatically convert it to an array. I assume it does this by shifting elements off the array pairwise and creating hash entries from them.

If you supply an option more than once then they will be merged. In the past when I did this it almost always did what I expected. Having been amazed a couple of times at what logstash did I now never supply an option more than once to the same filter instance.

Once all the options are merged, they can be processed. Consider

mutate {
    rename => {
        "field1" => "field2"
        "field2" => "field3"
    }
}

Does the first hash entry get executed first, so that you end up with a [field3], or does the second hash entry get executed first (in which case it is a no-op) so that you end up with a [field2]. The answer is that it depends on the version of logstash you are running.

At every stage of this order is not guaranteed. It could pop element pairs off the array instead of shifting them. It could merge the second rename hash into the first, or merge the first into the second. And once it has a single hash it can work from front to back or back to front.

If you want to control order then use multiple filter instances.

As to whether a particular option expects a hash or an array, the documentation has many cases where the option description states it requires a hash, but the example shows an array. Always trust the description over the example.

Thanks @Badger for the reply. I agree with you and have noticed the same behavior what you are saying when tried with various configuration setup.

But, here I am trying to highlight an issue which is seen during pipeline runtime although Logstash could have reported during application startup on configuration verification. I am aware that Hash is expected but only Array also works. So, the problem occurs when someone use both kinds (hash+array as shown in my example) in single filter because Logstash doesn't report it as Invalid configuration setup. This setup is also not captured in configuration verification -t option where it thinks it as Valid, but it is not during runtime.

What about having this in good to have feature to report during configuration syntax verification process only ? Because anything reported during runtime confuse whether it is data issue or configuration setup issue.

Regards,
Tas

OK, I did not understand your original complaint. I agree that it is not ideal. This configuration

input { generator { count => 1 lines => [ '' ] } }
filter {
     mutate { add_field => { "[details][data][field1]" => "alfa" } }
     mutate { add_field => { "[details][data][field2]" => "rOMEO" } }
     mutate {
        rename => [ "[details][data][field1]", "[new_field1]"]
        rename => {
            "[details][data][field2]" => "[new_field2]"
        }
    }
}
output { stdout { codec => rubydebug { metadata => false } } }

Does indeed produce a configuration error

[2021-09-04T12:43:47,459][ERROR][logstash.filters.mutate  ] Invalid setting for mutate filter plugin:

  filter {
    mutate {
      # This setting must be a hash
      # This field must contain an even number of items, got 3
      rename => ["[details][data][field1]", "[new_field1]", ["[details][data][field2]", "[new_field2]"]]
  ...

However, if you add a second entry to the hash

     mutate {
        rename => [ "[details][data][field1]", "[new_field1]"]
        rename => {
            "[details][data][field2]" => "[new_field2]"
            "[details][data][field3]" => "[new_field3]"
        }

You instead get a runtime error

Exception caught while applying mutate filter {:exception=>"Invalid FieldReference: `[\"[details][data][field2]\", \"[new_field2]\"]`"}

In general, you get an (accidental) configuration error for an odd number of hash entries, and a runtime error for an even number of hash entries.

If we reverse the order of entries in the first case

     mutate {
        rename => {
            "[details][data][field2]" => "[new_field2]"
        }
        rename => [ "[details][data][field1]", "[new_field1]" ]

we get no exception, but bad data

   "details" => {
    "data" => {}
},
          "" => "alfa",
"new_field2" => "rOMEO",

So merging arrays and hashes is all kinds of broken. You could open an issue on github, but I doubt it will ever get addressed.

1 Like

Thank you @Badger for your quick reply. I had already noticed all of your cases when I was debugging my scenario. But, I like that you have put all of them here at one place with examples.

From all this we are noticing that hash+array neither work well nor have consistent behavior. And, we are discussing this because having only array also work well at the same time which is not in documentation. I have seen lot many configurations file with one mutate filter having multiple rename options with single array elements. :slightly_smiling_face:
So, it's better Logstash does a quick check and just don't allow the user to have such configurations. This way stick to hash would become a recommended way as per documentation and won't leave a chance of failure.

Just a one step ahead for maturing the product.

Regards,
Tas