Logstash - Charset 1252 - Problems with Conversion

lgarcia · May 31, 2017, 11:45am

Hi everyone

I have been struggling with Logstash to be able to apply the windows charset CP1252 in a UDP input listener. I'm running on a windows server 2012 with Logstash 5.2.2.

The charset setup seems does not make effect and after I sent the data I see the character like that:
Sent data : [é ó ção]
Result in Elastic : [\xE9 \xF3 \xE7\xE3o]

The UDP setup follows below.

input
{
udp
{
port => 5140
codec => plain
{
charset => "CP1252"
}
type => "log4net"
}
}

The output setup is an ElasticSearch as follows

output
{
stdout{
codec => rubydebug
}
if[type] =="log4net"{
elasticsearch{
hosts => ["localhost:9200"]
index => "log-%{+YYYY.MM.dd}"
}
}
}

I did a try sending direct to ElasticSearch without Logstash and it worked.

Any tips to solve it is very welcome

Thanks & Regards

lgarcia · June 6, 2017, 8:51pm

After some more tests, it seems that the problem should be happen when grok apply the filter before it goes to output.
Has anyone experienced this issue?

Thanks

pts0 · June 6, 2017, 8:54pm

I don't see any grok in your config

lgarcia · June 7, 2017, 4:41pm

Sorry, the filter is:

filter
{
if [type] == "log4net"
{
grok
{
remove_field => message
match => { message => "(?m)%{TIMESTAMP_ISO8601:sourceTimestamp} %{DATA:userName} %{WORD:machineName} %{DATA:loggerName}: %{DATA:threadId} %{LOGLEVEL:level} %{DATA:systemname} %{WORD:environment} %{WORD:site} %{GREEDYDATA:tempMessage}" }
}
if !("_grokparsefailure" in [tags])
{
mutate {
replace => [ "message" , "%{tempMessage}"]
}
}
mutate {
remove_field => [ "tempMessage" ]
remove_field => [ "tempHost" ]
}
}
}

pts0 · June 8, 2017, 6:58am

I really don't understand your question, sorry.
What you mean

lgarcia · June 8, 2017, 5:08pm

When I send a given message to logstash, lets say "[é ó ção]" it seems that after the filter process the is performed by a grok codec to transform the message, the initial value that was "[é ó ção]" is wrongly converted to this the characteres "\xE9 \xF3 \xE7\xE3o". I'm not understanding what's the reason of it's behavior . If I remove the filter step, the message arrives in the right way to the output since I can see it in Kibana.

system · July 6, 2017, 5:09pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
[LOGSTASH] - Plugin input-udp and message charset Logstash	2	370	October 2, 2022
Python SocketHandler charset Logstash	4	1677	November 22, 2017
UDP-input Receiving an encoding value � Logstash	8	226	September 19, 2023
Syslog input, received character set? Logstash	4	3760	July 6, 2017
Can give a examplge about how logstash use udp data to es? Logstash	17	832	April 13, 2018

Logstash - Charset 1252 - Problems with Conversion

Related topics