Removing special characters from field name

Rob_wylde · January 5, 2020, 3:49am

if "Holding Dealer Organic ID"                                  
{                                                               
        mutate { add_tag => ["holding_dealer"] }                
        mutate { lowercase =>  ["Holding Dealer Organic ID"] }  
}

lowercase is not working nor is the rename function but the tag gets added so the 'if' matches. I believe it is because of the special characters only viewable via cat -e filename. This is what the field name looks like in the raw file.

M-oM-;M-?Holding Dealer Organic

I've tried using 'sed' to remove these characters but it is making a mess of other fields. My question is how can i remove these characters in logstash so that i can actually make us of mutate functionality on this field?

Thanks

Badger · January 5, 2020, 10:54pm

That's a byte order mark. I would do this in ruby

ruby {
    code => '
        event.to_hash.each { |k, v|
            newk = k.someFunction()
            event.set(newk, v)
        }
    '
}

In your case .someFunction might be a straight gsub of the BOM to "", or you might want to remove all control characters and characters above 128 (code available here), or you might want to be much stricter and go with something like gsub!(/[-_a-zA-Z0-9])

Edited to add ... if your events contain a byte order mark then your input is probably not set to consume UTF-8. I would expect (but have not tested) that changing the encoding on the input would not only remove the BOM, but also ensure you get the right representation for any other obscure characters in the events (field values as well as names). Just in case someone sends some Simplified Chinese your way

Rob_wylde · January 6, 2020, 12:40am

I decided to just fix the source text. I appreciate the ruby code though.

sed -i '1 s/^.//' $input`

Removes the BOM and life is grand once again. Found this burried on a stackoverflow page.

andres-perez · January 7, 2020, 2:03pm

Just to add more possibilities - sometimes the regex \xEF\xBB\xBF (more info) can be useful to work with BOM within logstash or filebeat configurations.

I had a simpler use case (no need to mess with field names) where the 1st line of a file must be discarded. UTF-8 codec didn't help and hide the BOM, maybe an issue of working with files between different systems (windows / linux). The line starts with the BOM and some static content.

That can be achieved in logstash with regex. I chose to enclose the pattern \xEF\xBB\xBF in a non-capturing group (?: ... ) whose presence is optional ? and can be found just after the beginning ^ of the line:

# logstash: drop messages that start with BOM
if [message] =~ /^(?:\xEF\xBB\xBF)?contents_of_1st_line_that_must_be_excluded.*/ {
  drop { }
}

or filebeat configuration:

filebeat.prospectors:
- input_type: log
  paths:
    - path/to/files/*.csv
  exclude_lines: ['^(?:\xEF\xBB\xBF)?contents_of_1st_line_that_must_be_excluded.*', 'other patterns']
  encoding: utf-8

system · February 4, 2020, 2:03pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Logstash - Manipulate field with special characters Logstash	16	2933	December 28, 2018
Rename field problem Logstash	3	613	December 17, 2021
How to remove special character from string in logstash Logstash	9	12339	January 16, 2017
Using Gsub to replace field values Logstash	9	5019	September 13, 2017
Event API ; event.remove() does not work on key with special character ? encoding problem? Logstash	1	606	December 17, 2018

Removing special characters from field name

Related topics