Issue with Character Encoding When Receiving Data from RSYSLOG in Logstash 8.4.3

nw-engineer · February 19, 2024, 2:36pm

Hello,

I'm currently using Logstash version 8.4.3 and encountering an issue when processing data received from RSYSLOG. The error message I'm seeing is as follows:

[WARN ][logstash.codecs.plain][main][b162f9e29529bc0184cfb35d26d7bc3a946f5283f15e0938bfcbb4bdce0da719] Received an event that has a different character encoding than you configured. ...omitted... catdesc="フリーウェア\xE3", :expected_charset=>"UTF-8"}

Upon inspecting the logs, it appears that the text which should read "フリーウェア・ソフトウェアダウンロード" is being truncated, and the log shows the text being cut off, followed by the error message above.

Additionally, I intentionally created binary data that mimics this truncation and sent it through RSYSLOG to Logstash, which resulted in the error being reproduced 100% of the time.

Based on this, am I correct in understanding that Logstash expects string data to be in UTF-8 encoding and that this error is inevitable when the data gets truncated at "xE3"?

If that's the case, are there any workarounds or solutions available on the Logstash side to handle this issue? Note that changing the character encoding to us-ascii is not an option due to our requirements.

Thank you for your assistance.

Badger · February 19, 2024, 5:10pm

The plain codec has a charset option that allows you to specify one of dozens of encodings other than UTF-8.

nw-engineer · February 20, 2024, 5:28am

Thank you for your prompt reply. I understand that the plain codec has a charset option that allows specifying various encodings other than UTF-8. However, the data being sent is already in UTF-8, and since I need to handle Japanese characters, using an encoding other than UTF-8 is not a viable option for me.

Given these constraints, my question is whether there is a way to prevent or handle the issue of truncated UTF-8 character sequences within Logstash. Is there a mechanism or configuration that allows Logstash to either gracefully handle these incomplete sequences or ensure that the truncation does not lead to errors?

I appreciate your guidance on how to address this specific challenge.

Badger · February 20, 2024, 5:34am

Which input are you using? Some inputs truncate data into chunks, which can result in breakage at the boundaries between chunks. The data the input is consuming is not UTF-8. If the data being sent to is UTF-8 then something is breaking it betwixt.

nw-engineer · February 20, 2024, 5:41am

The input codec is the default setting.

What I would like to ask is whether there are any remedies available in the event that data is missing.

for example
E3 83 95 E3 83 AA
If so, it will be loaded normally with "フリ".

but,
E3 83 95 E3 83 AA E3
In this case, NG because there is a byte (E3) indicating that there is a next character

I would like to know if there are any remedies for cases where this fails (for example, due to network problems).

system · March 19, 2024, 5:41am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Encoding and logstash Logstash	1	656	April 1, 2017
Logstash multiline charset => "UTF-8" Logstash	5	313	November 30, 2023
LogStash encoding Issue from Filebeat IIS Access Logs 7.4.0 Stack Logstash	6	846	November 13, 2019
Received an event that has a different character encoding than you configured Logstash	1	3663	January 27, 2017
Problem with unicode Logstash	1	783	October 30, 2017

Issue with Character Encoding When Receiving Data from RSYSLOG in Logstash 8.4.3

Related topics