Character encoding problems Filebeat & Logstash

keysersozee · September 28, 2018, 1:47pm

Hello,

I've read many topics in about this problem but it's still not resolved for me.

I have a fileabeat 6.2.2 installed on Windows, it will send logs logs to a Logstash 6.2.2 installed on Linux.

This my filebeat config:

- type: log
  scan_freqency: 10s  
  paths:
     - C:\logs\example.log
  encoding: utf-8

Logstash config:

input {
  beats {
    port => 5044
    codec => json { charset => "UTF-8" }
  }  
}

    output {
      http {
        automatic_retries => 10
        content_type => "application/json"
        format => "json"
        http_method => "post"
        ignorable_codes => 409
        keepalive => true
        url => "http://localhost:9090"
      }
      
      file {
        path => "D:\\result.txt"
    	codec => json { charset => "UTF-8" }
      }
    }

Logs are :

2018-09-26 16:05 - First: é, Second: è
2018-09-26 16:06 - Third: à, Fourth: â

Results in my web application (linked to a database) and in the file D:\result.txt:

2018-09-26 16:05 - First: ├®, Second: ├¿
2018-09-26 16:06 - Third: ├á, Fourth: ├ó

Results from the Console (Ruby debug output) :

2018-09-26 16:05 - First: é, Second: è
2018-09-26 16:06 - Third: à, Fourth: â

I tried many encoding charsets following theses links :
https://www.elastic.co/guide/en/logstash/current/plugins-codecs-json.html
https://www.elastic.co/guide/en/logstash/current/plugins-codecs-plain.html

But I always get the same issue.

I also changed configuration :

removed encoding: utf-8 from filebeat configuration
removed codec => json{ charset => "UTF-8" } from Logstash config

And nothing worked..

Can anyone help on this ? Thanks

guyboertje · September 28, 2018, 5:11pm

Maybe the windows files are in a Microsoft encoding?

I can't say for filebeat, but charset setting in a codec is a from setting, meaning that, say you have a file in CP1252 encoding (Windows) and Logstash/Elasticsearch must have and expects UTF8 then you set the charset setting to "CP1252".

Here you are saying, "I know I have X encoding so please convert it to UTF8".

A few people have tried "universal string encoding detection" of an arbitrary piece of text but most have failed because the confidence level of the detection is a function of string length and the number of occurrences of multi-byte sequences.

So Logstash does not know what the source charset of the input data is. You can try ASCII_8BIT because then LS will force encode to UTF8 and will replace any illegal UTF8 sequences with a � character.

keysersozee · October 11, 2018, 1:58pm

Thank you for the reply,

Yes, the Windows file is in ANSI encoding.
I had to set encoding to ANSI_X3.4-1968:

filebeat.prospectors:
- type: log
  enabled: true
  encoding: ANSI_X3.4-1968
  paths:
    - C:\logs\example.log

PS: to people using other components to (enrich or store data), you need to also configure the message default converter.

For example in a Spring Web application, if we need to interact with Logstash, we usually use RestTemplate

All we need to do is :to set UTF-8 as the default charset :

restTemplate.getMessageConverters().add(0, new StringHttpMessageConverter(Charset.forName("UTF-8")));

system · November 8, 2018, 2:04pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Filebeat - received an event - has different character encoding Beats	21	24832	July 5, 2017
Filebaet and logstash encoding problem Logstash docker	16	1166	May 17, 2023
LogStash encoding Issue from Filebeat IIS Access Logs 7.4.0 Stack Logstash	6	846	November 13, 2019
FileBeat in Windows and LogStash in Linux Logstash	12	2589	March 1, 2017
Issue with kibana log data Kibana	5	573	February 6, 2020

Character encoding problems Filebeat & Logstash

Related topics