sreeram
(sreeram)
July 29, 2016, 7:02am
1
Hi,
Below is the logstash Input plugin configured to read logs generated by my Windows Application,
input {
file {
path => ["D:/ELK/LoggerTestApp/Server/Logger/GetAssetPointerById/**/*.txt"]
codec => plain { charset => "UTF-16" }
sincedb_path => ["D:/ELK/logstash/since.db"]
start_position => "beginning"
}
}
My Application is configured to write logs in Unicode/UTF-16 encoding format only. Once the logs are read & ported to ElasticSearch, I'm seeing a invalid character (�) in each log as shown below
Please advice me on how to avoid these invalid characters.
Environment Details
Operating System : Win 7, Win 2008 R2
ElasticSearch : elasticsearch-2.3.4
Logstash : logstash-2.3.4
Kibana : kibana-4.5.3-windows
Thanks in Advance.
Did you resolve this?
Did you try a different charset, like charset => "ISO-8859-1"?
Or are you looking for elasticsearch to handle the charset differently: https://www.elastic.co/guide/en/elasticsearch/guide/current/unicode-normalization.html
sreeram
(sreeram)
November 7, 2016, 7:35am
3
Hi Matthew,
Thanks for your response.
We're still living with the Invalid character issue.
I'll try the suggestions provided .
1 Like
mkm
(Mohamed KHEMAKHEM)
May 28, 2017, 9:12am
4
Hi,
did you resolve this?
CDR
(Colton)
June 20, 2017, 5:38pm
5
Was there any resolution here? I am currently dealing with the same situation.
The question-mark-in-black-diamond character is a replacement character that is used when the UTF16 -> UTF8 character conversion fails.
This piece of config codec => plain { charset => "UTF-16" }
says to Logstash "Treat all text as UTF16 and convert it to UTF8"
There may be some illegal surrogates http://unicode.org/faq/utf_bom.html#utf16-7
or maybe the charset conversion library we use does not deal with noncharacters http://www.unicode.org/faq/private_use.html#noncharacters very well.
Kurt_S
(Kurt Schraeyen)
January 11, 2018, 4:35pm
7
Hi,
I had the same kind of problem: a "�" (black diamond or cube) at the end of each line after converting from UTF-16.
I don't know what the character was (carriage return, vertical tab,... - something like that) but I did not need it.
I worked around the problem by stripping the "�" from the message field:
mutate {
gsub => ["message","�",""]
}
1 Like