I am using kv to parse a certain log that contains comma-separated key-value pairs. However, the keys may contain multiple space-separated words with inconsistent capitalization, and I'm trying to bring the resulting field names to a defined naming convention - no spaces, every word capitalized, i.e.:
Input: Field One: value 1, Field two: value 2, Field three: "value 3"
Output:
Right now, I remove the spaces in the field names using kv, and then I'm left with inconsistently-capitalized field names, and whatever does not fit the convention, I rename using mutate:
kv {
source => "message"
target => "processed_message"
field_split => ","
value_split_pattern => ": "
# Most keys already capitalize every word, which is the chosen naming convention;
# "capitalize" only works for the first word and breaks the rest.
#transform_key => "capitalize"
remove_char_key => "- "
trim_value => "\""
}
mutate {
rename => { ... }
}
I've been trying to find a universal solution for this case, since new fields may appear, and I don't want to amend the config every time. Is there a right way to do this with available plugins, or can it only be done with Ruby code?
This is the code that I came up with. Works well so far. Feel free to use, but do mind that I do not know Ruby at all.
filter {
# Parse out comma-separated key-value pairs.
# Commas are escaped by quoting the string; kv seems to handle this if trim_value is set, but this is not documented.
# Space removal and capitalization is handled by ruby code below.
kv {
source => "message"
target => "processed_message"
field_split => ", "
value_split_pattern => ": "
trim_value => "\""
remove_field => [ "message" ]
}
# Enforce field naming convention for the target container:
# Alphanumeric characters only, every word capitalized.
ruby {
code => "
# Only work on target container fields
target_container = '[processed_message]'
keys = event.get(target_container).to_hash.keys
keys.each { |key|
# Split on all non-alphanumeric characters - spaces, dashes, brackets, etc.
# Should skip the loop if the array is empty.
words = key.split(/[^[:alnum:]]+/)
words.each_index { |i|
words[i].capitalize!
}
# You should use words.flatten.join('') to account for nested fields,
# but this should never happen, so I'd rather avoid the overhead.
new_key = words.join('')
# Rename the resulting field
event.set(
target_container + '[' + new_key + ']',
event.remove(target_container + '[' + key + ']')
)
}
# Exception handling
rescue Exception => e
event.set('logstash_ruby_exception', 'capitalize: ' + e.message)
"
}
}
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.